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•3  Introduction 


Increasing  the  enrollment  of  patients  in  clinical  trials  is  important  to  making  progress  towards 
finding  more  effective  treatments  for  breast  cancer.  Accrual  is  complicated  by  a  large  number 
potential  studies  and  the  cost  and  complexity  of  determining  whether  a  patient  meets  the  nec¬ 
essary  eligibility  criteria.  Under  this  proposal,  we  are  developing  a  Web  based  expert  system 
which  can  determine  the  patients  eligibility  for  clinical  trials.  The  expert  system  is  designed 
to  take  into  account  the  cost  of  tests  which  are  required  to  meet  inclusion  criteria  and  acquire 
information  in  the  most  cost-effective  way  possible. 

Additionally,  it  is  important  to  be  able  to  easily  add  and  remove  clinical  trials  to  the  system. 
Trials  are  continually  becoming  available,  going  on  suspension  or  being  closed  to  accrual.  Towards 
this  end,  we  have  developed  a  companion  Web  based  system  that  enables  anyone  to  simply  enter 
the  information  required  to  describe  the  eligibility/ineligibility  criteria  for  a  clinical  trial.  A 
newly  entered  trial/protocol  can  then  be  directly  included  in  the  Clinical  trial  assignment  expert 
system  with  no  expert  intervention. 

Finally,  we  have  worked  on  methods  of  utilizing  probabilities  to  order  questions  so  that  those 
most  likely  to  rule  patient  out  of  a  protocol  are  first.  Recent  testing  has  shown  this  is  effective. 


4  Body 

In  the  extension  year,  fourth  year,  we  have  done  the  following.  We  have  entered  approximately 
44  new  patients  into  our  system  to  check  them  for  eligibility  in  available  breast  cancer  protocols. 
In  total,  they  were  found  eligible  for  78  trials  to  which  they  did  not  get  assigned.  We  have 
added  several  new  protocols.  We  have  revised  and  published  a  journal  paper  (in  the  Artificial 
Intelligence  in  Medicine  journal)  that  shows  significant  potential  for  the  increase  in  accruals  to 
clinical  trials  using  our  system. 

A  new  version  of  the  software  has  been  developed  that  keeps  all  of  the  eligibility  criteria  in 
memory  after  the  user  provides  some  information.  This  significantly  speeds  up  the  response  of 
the  system  and  makes  it  more  usable  clinically. 

The  system  has  been  tested  by  both  research  nurses  and  physicians.  One  of  the  concerns  they 
have  expressed  is  that  some  questions  get  asked  after  they  have  given  a  previous  answer  which 
implies  the  answer  for  the  latter  question.  It  is  possible  to  develop  implication  rules  which  take 
care  of  this  problem,  but  that  requires  interaction  with  very  busy  experts  in  the  medical  field. 
Hence,  a  study  been  undertaken  into  how  to  learn  these  implications  as  patient  data  is  entered. 

Using  Association  Rules,  we  have  been  able  to  find  all  of  the  expert  derived  implications. 
Further,  we  have  discovered  some  new  ones  in  which  when  information  about  a  particular  test 
is  available,  this  indicates  that  biopsy  has  been  done,  for  example. 

Also,  a  new  method  of  minimizing  the  number  of  questions  needed  or  the  amount  of  data 
needed  to  determine  if  the  patient  is  ineligible  has  been  developed.  A  slight  modification  of  the 
same  approach,  provides  the  user  of  the  system  with  an  indication  of  how  likely  it  is  that  the 
patient  will  be  eligible  for  any  particular  protocol  they  are  exploring.  This  will  enable  them  to 
focus  on  a  particular  protocol  or  protocols  as  information  is  entered,  if  they  wish.  The  results 
of  this  work  were  codified  in  a  conference  paper  in  the  IEEE  Computer-based  medical  systems 
conference. 


4 


'Table  1:  Results  of  selecting  clinical  trials  for  the  187  past  patients  and  169  current  patients. 
We  give  the  number  of  trial  participants,  selected  by  both  the  system  and  Moffitt  clinicians,  and 
the  number  of  the  other  eligible  patients,  identified  by  the  system. 

(a)  Results  for  the  187  past  patients. 


Clinical 

Trial 

Parti¬ 

cipants 

Other 

Eligible 

10822 

10 

5 

10840 

0 

19 

11072 

48 

26 

11378 

4 

19 

11992 

5 

6 

12100 

8 

20 

12101 

20 

30 

(b)  Results  for  the  169  current  patients. 


Clinical 

Trial 

Parti¬ 

cipants 

Other 

Eligible 

11132 

4 

1 

11931 

2 

26 

11971 

4 

0 

12100 

0 

5 

12101 

11 

52 

12385 

0 

19 

12601 

0 

1 

12643 

16 

36 

12757 

1 

3 

12775 

23 

17 

For  completeness,  we  repeat  what  was  done  in  the  third  year  below.  In  the  third-year,  we 
have  refined  the  original  prototype  to  produce  version  1.4.  We  have  tested  it  with  data  from  187 
retrospective  patients  and  169  more  recent  patients  including  some  who  are  currently  undergoing 
treatment.  We  have  extensively  tested  its  ability  to  order  questions  associated  with  tests  to  save 
dollar  costs  on  over  300  patients.  Table  1  summarizes  our  matching  results  on  the  past  and 
current  patients.  Patients  are  only  evaluated  for  trials  that  are  currently  enrolling  patients.  The 
trial  status  can  change  when  a  trial  is  put  on  suspension,  closed,  brought  off  suspension,  or 
initiated.  It  can  be  seen  that  the  system  finds  all  matches  that  correspond  to  trials  in  which 
patients  have  been  enrolled.  For  the  current  169  patients  for  which  extensive  tests  have  been 
done,  we  found  160  new  matches  to  protocols!  This  is  quite  promising  for  increasing  accrual. 

The  cost  savings  are  shown  in  Table  2.  We  show  the  mean  test  costs  with  and  without  the 
ordering  heuristics.  Six  clinical  trials  have  incurred  selection  costs;  the  heuristics  have  reduced 
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Table  2:  Cost  savings  by  test  reordering. 


(a)  Results  for  the  187  past  patients. 


Clinical 

Trial 

Mean  Cost 

W/0  Test 
Reordering 

With  Test 
Reordering 

10822 

$70 

$11 

10840 

$0 

$0 

11072 

$209 

$60 

11378 

$35 

$19 

11992 

$0 

$0 

12100 

$0 

$0 

12101 

$0 

$0 

(b)  Results  for  the  169  current  patients. 


Clinical 

Trial 

Mean  Cost 

W/0  Test 
Reordering 

With  Test 
Reordering 

11132 

$0 

$0 

11931 

$0 

$0 

11971 

$192 

$192 

12100 

$0 

$0 

12101 

$0 

$0 

12385 

$0 

$0 

12601 

$36 

$3 

12643 

$0 

$0 

12757 

$107 

$107 

12775 

$0 

$0 

the  costs  for  four  of  these  trials,  and  have  not  affected  the  costs  for  the  other  two  trials. 

There  are  now  15  protocols  available  in  the  system.  At  the  present  time,  all  breast  cancer 
protocols  at  the  Moffitt  Cancer  Center  which  are  accruing  at  least  two  patients  a  month  are 
available  through  our  system.  Our  automated  clinical  trial  updating  system  continues  to  allow 
us  to  easily  add  trials  to  the  system  [1,  2]. 

We  have  created  a  question  ordering  system  that  uses  a  crude  probabilistic  heuristic.  As 
patients  are  tested  against  the  system  over  time,  we  can  keep  a  record  of  how  many  times  each 
question  causes  a  patient  to  be  classified  as  ineligible  for  a  protocol.  These  results  will  take  the 
form  of  X  out  of  y  times  that  a  question  was  asked  it  directly  caused  a  patient  to  be  determined 
ineligible  for  protocol  2.  The  value  (|)gp  can  be  treated  as  the  probability  that  question  q  will 
cause  a  patient  to  be  declared  ineligible  for  protocol  p.  This  value  will  be  reasonably  reliable 
after  the  question  has  been  asked  more  than  30  times.  At  that  point,  it  can  be  used  to  reorder 
questions.  The  question  with  the  highest  probability  of  making  a  patient  ineligible  for  a  trial  can 
be  displayed  first.  By  doing  this  patients  will  be  quickly  determined  ineligible  with  a  minimum 
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'number  of  questions. 

Preliminary  experiments  have  been  done  which  indicate  this  approach  does  in  fact  reduce  the 
number  of  questions  necessary  to  determine  eligibility.  Results  are  shown  in  Table  3. 

We  selected  90  patients  at  random  from  our  list  of  patients  and  used  their  data  in  experiments. 
A  ten-fold  cross  validation  was  carried  out,  so  that  the  system  was  trained  on  81  patients  and 
the  remaining  9  patients  were  tested  using  the  system.  The  test  was  done  on  six  protocols  for 
which  90  patients  had  been  tested.  As  can  be  seen  in  Table  3,  the  probabilistic  system  allows 
approximately  13%  less  questions  to  be  answered  to  determine  eligibility,  on  average. 


Table  3;  Probabilistic  question  ordering  vs.  analytic  question  ordering. 


Ten-fold  cross  validation 

Average  number  of  questions 

Protocol 

Probabilistic 

Analytical 

Difference 

System 

System 

Difference 

% 

11931 

15.35 

18.90 

3.55 

18.78 

12100 

13.85 

13.95 

0.10 

0.72 

12101 

21.65 

24.75 

3.10 

12.53 

12521 

14.75 

19.05 

4.30 

22.57 

12601 

13.90 

15.70 

1.80 

11.46 

12777 

14.40 

16.10 

1.70 

10.56 

Average 

15.65 

18.08 

2.43 

13.42 

Key  Research  Accomplishments: 

•  We  have  enhanced  our  prototype  system  to  very  stable  version  1.4.  We  have  corrected  cost 
functionality  (mostly  by  getting  the  costs  of  tests  correct  and  determining  all  tests  that  are 
done  in  the  routine  care)  and  tested  this  successfully. 

•  Utilizing  retrospective  patient  data  and  current  patient  data,  it  has  been  found  that  patients 
are  eligible  for  multiple  protocols/trials.  Further,  with  current  patient  data  we  find  patients 
eligible  for  trials  and  not  put  on  any  trial. 

•  Extensive  testing  of  cost  functionality  has  been  done.  We  have  determined  that  in  many 
cases  there  is  no  possibility  of  saving  costs.  However,  when  it  is  possible  the  cost  mechanism 
recommends  questions  in  order  that  will  always  allow  eligibility  to  be  determined  in  the 
minimal  cost  fashion. 

•  We  have  developed  a  method  of  determining  probabilities  that  questions  will  show  patients 
are  ineligible  for  trials.  The  probabilities  are  determined  empirically  while  the  system  is  in 
use.  We  showed  that  the  use  of  these  probabilities  to  order  questions  on  a  query  page  will 
result  in  the  need  to  answer  less  questions  at  all  times. 

•  We  have  applied  data  mining,  via  association  rules,  to  100  cases  that  have  been  put  through 
the  system.  This  has  enabled  us  to  find  what  we  call  fact  implications  rules.  For  example. 
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.  if  a  question  on  a  biopsy  is  answered  as  yes,  it  is  clear  that  some  surgery  has  been  done  and 
it  is  not  necessary  to  ask  questions  about  whether  the  patient  has  ever  had  any  surgery. 
The  ability  to  recover  such  rules  will  streamline  the  system  from  a  usability  point  of  view 
and  allow  to  improve  with  time  without  requiring  interviews  with  physicians  and  nurses. 

Reportable  Outcomes:  We  have  had  a  paper  [3],  which  is  attached,  published  the  2003 
IEEE  International  Conference  on  Systems,  Man,  and  Cybernetics.  We  have  a  paper  about  to 
be  published  in  the  Artificial  Intelligence  in  Medicine  journal  [4].  It  is  attached.  A  paper  has 
been  just  recently  published  at  the  IEEE  Computerized  Medical  Based  Systems  conference.  It 
is  attached. 

We  have  submitted  a  revised  proposal  to  the  National  Institutes  of  Health  to  take  this  system 
into  clinical  operation  at  the  Moffit  Cancer  Center.  The  previous  proposal  was  well-received,  but 
there  were  concerns  about  the  level  of  cooperation  with  the  Cancer  Center. 

A  complete  bibliography  of  papers  from  this  grant  is: 

•  Savvas  Nikiforou,  Eugene  Fink,  Lawrence  O.  Hall,  Dmitry  B.  Goldgof,  and  Jeffry  P. 
Krischer,  Knowledge  Acquisition  for  Clinical-Trial  Selection,  IEEE  International  Confer¬ 
ence  on  Systems,  Man  and  Cybernetics,  October  2002. 

•  Princeton  K.  Kokku,  Lawrence  0.  Hall,  Dmitry  B.  Goldgof,  Eugene  Fink,  and  Jeffry  P. 
Krischer,  A  Cost-effective  Agent  for  Clinical  Trial  Assignment,  IEEE  International  Confer¬ 
ence  on  Systems,  Man  and  Cybernetics,  October  2002. 

•  E.  Fink,  L.  O.  Hall,  D.  B.  Goldgof,  B.  Goswami,  M.  Boonstra,  J.  P.  Krischer,  Experiments 
on  the  Automated  Selection  of  Patients  for  Clinical  Trials,  IEEE  International  Conf.  on 
Systems,  Man  and  Cybernetics,  pp.  4541-4545,  Oct.  2003. 

•  E.  Fink,  P.K.  Kokku,  S.  Nikiforou,  L.O.  Hall,  D.B.  Goldgof,  J.P.  Krischer,  Selection  of 
Patients  for  Clinical  Trials:  An  Interactive  Web-Based  System,  Artificial  Intelligence  in 
Medicine,  To  Appear  2004. 

•  Bhavesh  D.  Goswami  and  Lawrence  0.  Hall  and  Dmitry  B.  Goldgof  and  Eugene  Fink 
and  Jeffrey  P.  Krischer,  Using  Probabilistic  Methods  to  Optimize  Data  Entry  in  Accrual 
of  Patients  to  Clinical  Trials,  The  17th  IEEE  Symposium  on  Computer-Based  Medical 
Systems,  2004. 

A  web  prototype  of  the  clinical  trial  assignment  system  is  available  at 
http://morden.csee.usf.edu/moffit  with  password  available  from  the  principal  investigator. 

M.S.  Theses: 

Bhavesh  Goswami,  Computer  Science,  May  2004. 

Princeton  Kokku,  Computer  Science,  August  2003. 

Savvas  Nikiforou,  Computer  Science,  May  2002. 

5  Conclusions 

We  have  developed  a  scalable  prototype  which  currently  can  determine  eligibility  for  sixteen 
breast  cancer  clinical  trials.  The  system  has  been  tested  using  retrospective  data  from  201 
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‘patients  who  are  assigned  to  some  clinical  trial  and  more  recently  active  patients  numbering  213. 
The  system  correctly  finds  cases  in  which  a  patient  is  eligible  for  multiple  clinical  trials.  It  has 
found  240  matching  trials  for  the  213  current  patients.  This  indicates  that  it  is  quite  likely  they 
use  of  the  system  will  significantly  increase  accruals  to  trials. 

The  system  is  able  to  utilize  monetary  cost  in  requesting  tests  to  rule  in/rule  out  a  patient 
from  the  set  of  available  clinical  trials.  The  default  ordering  of  questions  allows  the  system  user 
to  rapidly  determine  the  eligibility  or  ineligibility  of  a  patient  for  any  subset  of  the  available 
clinical  trials  entered  into  the  system.  We  have  been  able  to  show  a  good  average  cost  saving  by 
using  the  cost  feature  to  order  questions.  Of  course,  there  is  no  guarantee  that  a  clinician  would 
order  tests  as  suggested  by  the  question  ordering  of  our  system.  However,  the  potential  for  cost 
savings  is  significant. 

The  system  is  Web  based  and  password  protected.  It  provides  rapid  response  when  a  person 
enters  answers  to  one  or  more  questions  on  a  page  of  system  selected  questions.  It  can  be  used 
from  any  computer  on  the  World  Wide  Web.  Hence,  community  physicians  will  be  able  to 
determine  the  potential  eligibility  (they  may  not  wish  to  run  all  tests)  of  the  patient  for  clinical 
trials  at  cancer  centers  in  their  region. 

A  prototype  to  enable  physicians,  nurses  or  technicians  to  enter  new  protocols  has  been 
completed.  The  system  is  now  in  use.  It  reduces  the  time  required  to  add  a  new  trial  or  protocol 
to  approximately  1  hour.  It  enables  non-computer  scientists  to  add  trial/protocols  to  the  system. 
This  knowledge  acquisition  tool  has  been  designed  to  minimize/eliminate  the  cases  where  similar 
questions  acquiring  essentially  the  same  information  would  have  to  be  asked.  This  feature  has 
the  potential  to  cause  slight  changes  to  the  wording  of  inclusion/exclusion  criteria.  We  believe 
that  this  change  is  minor  and  will  have  no  effect  on  IRB  approval. 

Last  year,  we  intended  to  evaluate  whether  IRB  approval  would  be  affected.  However,  institu¬ 
tional  issues  prevented  this.  However,  this  year  we  will  have  new  protocols  entered  using  existing 
questions  and  plan  to  go  back  to  the  IRB  board  to  discuss  any  changes  in  criteria  wording  to  fit 
existing  questions  within  the  system.  An  example  would  be  a  protocol  in  which  there  are  two 
questions  which  ask  “is  a  test  value  is  greater  than  some  threshold”  and  then  a  separate  question 
that  asks  if  it  is  less  than  some  threshold,  versus  a  single  question  which  asks  if  a  test  is  in  some 
range.  We  believe  that  such  a  change  is  trivial,  but  this  must  be  addressed  in  practice  and  we 
will  evaluate  whether  it  causes  review  board  decisions  to  potentially  change. 

We  have  utilized  Bayes  rule  to  provide  a  likelihood  prediction  for  a  patient  being  eligible  for 
particular  trials  at  any  given  step  after  enough  patients  have  been  put  through  the  system  for 
that  trial.  It  appears  quite  useful.  A  version  of  it  has  been  used  to  minimize  the  number  of 
questions  required  to  be  answered  before  patient  is  ruled  ineligible  for  a  protocol.  There  was 
between  a  13  and  22  percent  reduction  in  the  number  of  questions  needed. 

5.1  The  future 

The  prototype  system  shows  the  potential  for  allowing  community  physicians,  as  well  as  cancer 
center  physicians,  to  quickly  and  cost  effectively  determine  for  which  clinical  trials  a  patient  may 
be  eligible.  It  holds  the  promise  of  enabling  greater  patient  accrual  for  trials  by  increasing  the 
awareness  of  each  trial  for  treating  physicians  throughout  a  region.  In  this  future,  we  would  like 
to  change  our  IRB  to  allow  evaluation  of  how  many  patients  not  eligible  for  clinical  trials  were 
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^actually  missed  by  clinical  practitioners  vs.  excluded  for  a  particular  reason  (e.g.  it  was  clear 
they  would  not  agree)  or  were  offered  a  trial  and  declined  to  enter  it. 
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Summary  The  purpose  of  a  clinical  trial  is  to  evaluate  a  new  treatment  procedure. 
When  medical  researchers  conduct  a  trial,  they  recruit  participants  with  appropriate 
health  problems  and  medical  histories.  To  select  participants,  they  analyze  medical 
records  of  the  available  patients,  which  has  traditionally  been  a  manual  procedure. 

We  describe  an  expert  system  that  helps  to  select  patients  for  clinical  trials.  If  the 
available  data  are  insufficient  for  choosing  patients,  the  system  suggests  additional 
medical  tests  and  finds  an  ordering  of  the  tests  that  reduces  their  total  cost.  Experi¬ 
ments  show  that  the  system  can  increase  the  number  of  selected  patients.  We  also 
present  an  interface  that  enables  a  medical  researcher  to  add  clinical  trials  and 
selection  criteria  without  the  help  of  a  programmer.  The  addition  of  a  new  trial  takes 
10—20  min,  and  novice  users  learn  the  functionality  of  the  interface  in  about  an  hour. 
©  2004  Elsevier  B.V.  All  rights  reserved. 


1.  Introduction 

Cancer  causes  550,000  deaths  in  the  United  States 
every  year  [1 ,2],  and  the  treatment  of  cancer  is  an 
active  research  area.  Medical  researchers  explore 
neAV  treatment  methods,  such  as  drugs,  surgery 
techniques,  and  radiation  therapies.  An  experiment 
with  a  new  treatment  procedure  is  called  a  clinical 
trial.  When  researchers  conduct  a  trial,  they  recruit 
patients  with  appropriate  cancer  types  and  medical 
histories.  The  selection  of  patients  has  traditionally 
been  a  manual  procedure,  and  studies  have  shown 
that  clinicians  can  miss  up  to  60%  of  the  eligible 
patients  [3-8]. 

If  the  available  records  do  not  provide  enough 
data,  clinicians  perform  medical  tests  as  part  of  the 

‘Corresponding  author. 

E-mail  addresses:  e.fink@cs.cmu.edu  (E.  Fink),  kokku® 
csee.U5f.edu  (P.K.  Kokku),  5avvasn@ucy.ac.cy  (S.  Nikiforou), 
haU@csee.usf.edu  (L.O.  Hall),  goldgof@csee.usf.edu  (D.B.  Gold¬ 
gof),  jpkrischer@moffitt.usf.edu  (J.P.  Krischer). 


selection  process.  The  costs  of  most  tests  have 
declined  over  the  last  decade,  but  the  number  of 
tests  has  increased  [9,10],  which  is  partially  due  to 
inappropriate  ordering  of  tests  [11,12].  Clinicians 
can  reduce  the  cost  by  first  requiring  inexpensive 
tests  and  then  using  their  results  to  avoid  some 
expensive  tests;  however,  finding  the  right  ordering 
may  be  a  complex  optimization  problem. 

The  purpose  of  the  described  work  is  to  automate 
the  selection  of  patients  for  clinical  trials  and  mini¬ 
mize  the  cost  of  related  tests.  We  have  developed 
an  expert  system  that  identifies  appropriate  trials 
for  eligible  cancer  patients,  designed  a  web-based 
interface  that  enables  a  clinician  to  enter  new  trials 
without  the  help  of  a  programmer,  and  built  a 
knowledge  base  for  trials  at  the  Moffitt  Cancer 
Center,  located  at  the  University  of  South  Florida. 

We  begin  with  a  review  of  the  previous  work  on 
medical  expert  systems  (Section  2).  We  then  explain 
the  design  of  the  developed  system  and  present 
empirical  confirmation  of  its  effectiveness  (Section 
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3).  We  also  describe  the  interface  for  adding 
new  knowledge  (Section  4).  In  conclusion,  we  point 
out  some  limitations  of  the  developed  system  and 
compare  it  with  other  trial-selection  systems 
(Section  5). 


2.  Previous  work 

Researchers  have  developed  several  expert  sys¬ 
tems  that  help  to  select  clinical  trials  for  cancer 
and  AIDS  patients.  In  particular,  Musen  et  al.  built 
a  rule-based  system,  called  eon,  that  matched 
AIDS  patients  to  clinical  trials  [13].  Ohno-Machado 
et  al.  developed  the  aids^  system,  which  also 
assigned  AIDS  patients  to  clinical  trials  [14]. 
They  integrated  logical  rules  with  Bayesian  net¬ 
works,  which  helped  to  make  decisions  based  on 
incomplete  data  and  to  quantify  the  decision 
certainty. 

Bouaud  et  al.  created  a  cancer  expert  system, 
called  ONCODoc,  that  suggested  alternative  clinical 
trials  for  each  patient  and  allowed  a  physician  to 
choose  among  them  [15,16].  Seroussi  et  al.  used 
ONCODOC  to  select  participants  for  clinical  trials  at 
two  hospitals,  which  helped  to  increase  the  number 
of  selected  patients  by  a  factor  of  3  [17,18]. 

Hammond  and  Sergot  created  the  OaSiS  archi¬ 
tecture  [19],  which  had  a  graphical  interface  for 
entering  patients’  data  and  extending  the  knowl¬ 
edge  base.  Smith  et  al.  built  a  qualitative  system 
that  assisted  a  clinician  in  selecting  medical  tests, 
interpreting  their  results,  and  reducing  the  number 
and  cost  of  tests  [9,20]. 

Theocharous  developed  a  Bayesian  system  that 
selected  clinical  trials  for  cancer  patients  [21 ,22].  It 
learned  conditional  probabilities  of  medical-test 
outcomes  and  evaluated  the  probability  of  a 
patient’s  eligibility  for  each  trial.  On  the  negative 
side,  the  available  medical  records  were  often 
insufficient  for  learning  accurate  probabilities. 
Furthermore,  when  adding  a  new  trial,  the  user 
had  to  change  the  structure  of  the  underlying  Baye¬ 
sian  network.  To  address  these  problems,  Bhanja 
et  al.  built  a  qualitative  rule-based  system  for  the 
same  task  [23]. 

Breitfeld  et  al.  built  a  system  that  pre-selected 
potential  participants  for  three  clinical  trials 
related  to  a  specific  cancer,  called  rhabdomyosar¬ 
coma  [24].  Their  system  asked  eight  questions 
about  a  patient,  and  used  a  decision  tree  to  deter¬ 
mine  a  patient’s  eligibility.  The  questions  did  not 
cover  some  relevant  factors,  and  a  physician  had  to 
make  a  final  eligibility  decision  for  pre-selected 
patients.  The  authors  used  trial-specific  informa¬ 
tion  in  building  their  system,  and  they  pointed  out 


that  extending  the  system  to  include  other  trials 
would  require  a  major  effort. 

Fallowfield  et  al.  studied  how  physicians  selected 
cancer  patients  for  clinical  trials,  and  compared 
manual  and  automatic  selection  [25].  They  showed 
that  expert  systems  could  improve  the  selection 
accuracy;  however,  their  study  also  revealed  that 
physicians  were  reluctant  to  use  these  systems. 
Carlson  et  al.  conducted  similar  studies  with  AIDS 
trials,  and  also  concluded  that  expert  systems  could 
lead  to  a  more  accurate  selection  [26], 

Researchers  have  also  investigated  various  repre¬ 
sentations  of  medical  knowledge.  In  particular, 
Ohno-Machado  et  al.  proposed  the  Guideline  Inter¬ 
change  Format  for  medical  knowledge  [27].  Lind- 
berg  et  al.  considered  an  alternative  format,  called 
the  Unified  Medical  Language  System,  and  devel¬ 
oped  tools  for  converting  various  databases  into  this 
format  [28].  Rubin  et  al.  analyzed  selection  criteria 
for  cancer  clinical  trials  and  proposed  a  format  for 
these  criteria  [29,30].  Wang  et  al.  compared  eight 
previously  developed  formats  and  identified  main 
elements  of  medical  knowledge,  which  included 
patient  data,  treatment  decisions,  and  related 
actions  [31]. 

Eriksson  pointed  out  the  need  for  general-pur¬ 
pose  tools  that  would  allow  efficient  knowledge 
acquisition,  and  described  a  system  for  building 
such  tools  [32].  Tallis  et  al.  developed  a  library  of 
scripts  for  modifying  knowledge  bases,  which 
helped  to  enforce  the  consistency  of  the  modified 
knowledge  [33-35].  Blythe  et  al.  designed  a  general 
knowledge-acquisition  interface  based  on  previous 
techniques  [36].  Musen  developed  the  protege  envir¬ 
onment  for  creating  knowledge-acquisition  tools 
[37];  later,  researchers  used  it  in  the  work  on  AIDS 
expert  systems  [38,39],  and  on  an  asthma  treat¬ 
ment-selection  system  [40].  Musen  et  al.  extended 
pROTtGE  and  built  a  new  version,  called  pROTtGt-2000 
[41]. 


3.  Selection  of  clinical  trials 

Physicians  at  the  Moffitt  Cancer  Center  have  about 
150  clinical  trials  available  for  cancer  patients. 
They  have  identified  criteria  that  determine  a 
patient’s  eligibility  for  each  trial,  and  they  use 
these  criteria  to  select  trials  for  eligible  patients. 
Traditionally,  physicians  have  selected  trials  by  a 
manual  analysis  of  patients’  data.  The  review  of 
resulting  selections  has  shown  that  they  usually  do 
not  check  all  clinical  trials  and  occasionally  miss  an 
appropriate  trial. 

To  address  this  problem,  we  have  developed  an 
expert  system  that  helps  to  select  trials  for  each 
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1.  The  patient  is  female. 

2.  She  is  at  most  45  yecirs  old. 

3.  Her  cancer  stage  is  II  or  III. 

4.  Her  cancer  is  not  invasive. 

5.  At  most  3  lymph  nodes  have  cancer. 

6.  Either 

•  there  axe  no  cardiac  arrhythmias, 

•  or  all  tumors  are  at  most  2.5  cm. 


(a)  Eligibility  criteria. 


Figure  1  Example  of  eligibility 


General  information 
What  is  the  patient’s  sex? 

What  is  the  patient’s  age? 

Mammogram.  Cost  is  $150 
What  is  the  cancer  stage? 

Is  the  cancer  invasive? 

Biopsy,  Cost  is  $400 
What  is  the  cancer  stage? 

How  many  lymph  nodes  have  cancer? 
What  is  the  greatest  tumor  diameter? 

Electrocardiogram.  Cost  is  $200 
Are  there  cardiac-  arrhythmias? 

(b)  Tests  and  questions. 


ia  (a),  and  tests  and  questions  (b). 


patient.  It  prompts  a  clinician  to  enter  the  results  of 
medical  tests,  and  uses  them  to  identify  appropri¬ 
ate  trials.  If  the  available  records  do  not  provide 
enough  data,  the  system  suggests  additional  tests. 
We  give  an  example  of  the  selection  process, 
describe  the  main  elements  of  the  knowledge  base, 
and  outline  the  system’s  web-based  interface.  We 
then  give  experimental  results,  which  confirm  that 
the  system  helps  to  find  eligible  patients  and  to 
reduce  the  related  costs. 

3.1.  Example 

In  Fig.  1  (a),  we  give  a  simplified  example  of  elig¬ 
ibility  criteria  for  a  clinical  trial.  This  trial  is  for 
young  and  middle-aged  women  with  a  noninvasive 
cancer  at  stage  II  or  III.  When  testing  a  patient’s 
eligibility,  a  clinician  has  to  order  three  medical 
tests  (Fig.  1(b)). 

The  system  first  prompts  a  clinician  to  enter 
the  patient’s  sex  and  age.  If  the  patient  satisfies 
the  corresponding  conditions,  the  system  asks  for 
the  mammogram  results  and  verifies  Conditions  3 
and  4;  then,  it  requests  the  biopsy  and  electro¬ 
cardiogram  data.  The  ordering  of  tests  depends  on 
their  costs  and  on  the  amount  of  information 
provided  by  test  results.  The  system  begins  with 
the  mammogram  because  it  is  cheaper  than  the 
other  tests  and  provides  data  for  two  eligibility 
criteria. 

If  the  patient’s  records  already  include  some  test 
results,  the  clinician  can  answer  the  corresponding 
questions  while  entering  the  personal  data,  before 
the  system  selects  tests.  For  example,  if  the  records 
indicate  that  the  cancer  stage  is  IV,  the  clinician  can 
enter  the  stage  along  vrith  the  sex  and  age,  and  the 
system  immediately  determines  that  the  patient  is 
ineligible  for  this  trial. 


3.2.  Knowledge  base 

The  knowledge  base  includes  questions,  medical 
tests,  and  logical  expressions  that  represent  elig¬ 
ibility  criteria  for  each  trial.  Since  clinicians  specify 
eligibility  criteria  as  hard  constraints,  without  prio¬ 
rities  or  soft  constraints,  we  allow  only  hard-con- 
straint  logical  expressions.  The  system  does  not 
prioritize  eligibility  criteria,  and  it  treats  the  results 
of  medical  tests  in  the  same  way  as  other  data,  such 
as  sex,  age,  and  medical  history.  We  give  a  simpli¬ 
fied  example  of  tests  and  questions  in  Fig.  1  (b),  and 
logical  expressions  in  Fig.  2. 

The  system  supports  three  types  of  questions; 
the  first  type  takes  a  yes/no  response,  the  second  is 
multiple  choice,  and  the  third  requires  a  numeric 
answer.  For  example,  the  cancer  stage  is  a  multiple- 
choice  question,  and  the  tumor  diameter  is  a 
numeric  question.  The  description  of  a  medical  test 
includes  the  test  name,  dollar  cost,  and  list  of 
questions  that  can  be  answered  based  on  the  test 
results.  For  instance,  the  mammogram  in  Fig.  1  has 
a  cost  of  US$  1 50,  and  it  allows  the  answering  of  two 
questions.  Different  tests  may  answer  the  same 


sex  =  FEMALE  and 

sex  =  MALE  or 

age  <  45  and 

age  >  45  or 

cancer-stage  €  {n,  ill}  and 

cancer- stage  G  {i,  iv}  or 

invasive- cancer  —  NO  and 

invasive-ca.ncer  —  YES  or 

lymph-nodes  <  3  and 

lymph-nodes  >  3  or 

{arrhythmias  —  NO  or 

{arrhythmias  =  YES  and 

tumor- diameter  <  2.5) 

tumor- diameter  >  2.5) 

(a)  Acceptance  expression. 

(b)  Rejection  expression. 

Figure  2  Logical  expressions  for  the  criteria  in  Fig.  1  (a). 
The  acceptance  expression  (a)  represents  the  eligibility 
conditions,  whereas  the  rejection  expression  (b)  des¬ 
cribes  ineligible  patients. 
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question;  for  example,  both  mammogram  and 
biopsy  show  the  cancer  stage. 

We  encode  the  eligibility  for  a  trial  by  a  logical 
expression  that  does  not  have  negations,  called  the 
acceptance  expression.  It  includes  variables  that 
represent  the  available  medical  data,  as  well  as 
equalities,  inequalities,  "set-element”  relations, 
conjunctions,  and  disjunctions.  For  example,  we 
encode  the  criteria  in  Fig.  1(a)  by  the  expression 
in  Fig.  2(a).  In  addition,  the  system  uses  the  logical 
complement  of  the  eligibility  criteria,  called  the 
rejection  expression,  which  also  does  not  have 
negations  (Fig.  2(b));  it  describes  the  conditions 
that  make  a  patient  ineligible  for  the  trial. 

The  system  collects  data  until  it  can  determine 
which  of  the  two  expressions  is  true.  For  instance,  if 
a  patient’s  sex  is  male,  then  the  rejection  expression 
in  Fig.  2(b)  is  true,  and  the  system  immediately 
rejects  this  trial.  If  the  sex  is  female,  and  the  other 
values  are  unknown,  then  neither  acceptance  nor 
rejection  expression  is  true,  and  the  system  asks 
more  questions. 

If  the  knowledge  base  includes  multiple  clinical 
trials,  the  system  checks  a  patient’s  eligibility  for 
each  of  them.  It  first  prompts  the  clinician  to  enter 
the  personal  data  for  a  patient,  then  asks  for  the 
tests  related  to  multiple  trials,  and  finally  requests 
additional  tests  for  specific  trials.  After  getting 
each  new  answer,  the  system  re-evaluates  the 
patient’s  eligibility  for  each  trial.  It  displays  the 
list  of  matching  trials,  rejected  trials,  and  trials  that 
require  additional  information. 

3.3.  Order  of  tests 

If  a  patient’s  medical  records  do  not  include  enough 
data,  the  system  asks  for  additional  tests;  for  exam¬ 
ple,  if  the  records  do  not  provide  data  for  the 
eligibility  criteria  in  Fig.  1,  the  system  asks  for 
the  mammogram,  biopsy,  and  electrocardiogram. 
The  total  cost  of  tests  may  depend  on  their  order; 
for  instance,  if  we  begin  with  the  mammogram,  and 
it  shows  that  the  cancer  stage  is  IV,  then  we  can 
immediately  reject  the  trial  in  Fig.  1  and  avoid  the 
more  expensive  tests. 

We  have  explored  heuristics  for  ordering  the 
tests  based  on  the  test  costs  and  the  structure  of 
acceptance  and  rejection  expressions.  The  heuris¬ 
tics  use  a  disjunctive  normal  form  of  these  expres¬ 
sions;  that  is,  each  expression  must  be  a  disjunction 
of  conjunctions.  For  example,  the  rejection  expres¬ 
sion  in  Fig.  2(b)  is  in  disjunctive  normal  form, 
whereas  the  acceptance  expression  in  Fig.  2(a)  is 
not.  If  the  system  uses  ordering  heuristics,  it  has  to 
convert  this  acceptance  expression  into  the  disjunc¬ 
tive  normal  form  shown  in  Fig.  3. 


sex  =  FEMALE  ajid 
age  <  45  and 
cancer-stage  6  {ll,  III)  and 
invasive- cancer  =  NO  and 
lymph-nodes  <  3  and 
^  arrhythmias  —  NO 


sex  =  FEMALE  and 
age  <  45  and 
cancer-stage  S  {n,  ni}  and 
invasive-cancer  =  NO  and 
lymph-nodes  <  3  and 
'  tumor-diameter  <  2.5 


Figure  3  Disjunctive  normal  form  of  the  acceptance 
expression  in  Fig.  2. 


The  system  chooses  the  order  of  tests  that 
reduces  their  expected  cost.  After  getting  the 
results  of  the  first  test,  it  re-evaluates  the  need 
for  the  other  tests  and  revises  their  ordering.  The 
choice  of  the  first  test  is  based  on  three  criteria.  The 
system  scores  all  required  tests  according  to  these 
criteria,  computes  a  linear  combination  of  the  three 
scores  for  every  test,  and  chooses  the  test  with  the 
highest  score. 

(1 )  Cost  of  the  test  The  system  gives  preference 
to  cheaper  tests.  For  instance,  it  may  start 
with  the  mammogram,  which  is  cheaper  than 
the  other  two  tests  in  Fig.  1 . 

(2)  Number  of  clinical  trials  that  require  the  test: 
When  the  system  checks  a  patient’s  eligibility 
for  several  trials,  it  prefers  tests  that  provide 
data  for  the  largest  number  of  trials.  For 
example,  if  the  electrocardiogram  gives  data 
for  two  different  trials,  whereas  the  mammo¬ 
gram  provides  data  for  only  one  trial,  the 
system  may  prefer  the  electrocardiogram 
despite  its  higher  cost. 

(3)  Number  of  clauses  that  include  the  test 
results:  The  system  prefers  the  tests  that 
provide  data  for  the  largest  number  of  clauses 
in  the  acceptance  and  rejection  expressions. 
For  example,  the  mammogram  data  affect 
both  clauses  of  the  acceptance  expression  in 
Fig.  3  and  two  clauses  of  the  rejection 
expression  in  Fig.  1(b).  On  the  other  hand, 
the  electrocardiogram  affects  only  one  clause 
of  the  acceptance  expression  and  one  clause  of 
the  rejection  expression;  thus,  the  system 
should  order  it  after  the  mammogram. 

The  system  disregards  the  costs  of  tests  performed 
in  the  normal  course  of  treatment,  and  accounts 
only  for  the  costs  related  to  the  selection  of  clinical 
trials.  For  example,  if  a  patient  needs  the  mammo¬ 
gram  regardless  of  trial  participation,  the  system 
views  it  as  a  zero-cost  test.  On  the  other  hand,  if  the 
only  purpose  of  the  biopsy  and  electrocardiogram  is 
to  select  trials,  the  system  uses  heuristics  to  order 
these  tests. 

Although  the  system  suggests  the  single  most 
effective  test,  it  allows  a  clinician  to  order  multiple 
tests  at  once.  For  instance,  if  it  indicates  that  the 
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Adding  patients 

•  Add  a  new  patient 

•  Find  an  old  patient 


Selecting  clinical  trials 

•  Choose  candidate  trials 

•  View  available  trials 


Entering  initial  data 
►  Answer  initial  questions 
»  Change  previous  answers 


Entering  medical  data 

•  Enter  test  results 

•  View  eligibility  decisions 

Y  A 


Revising  medical  data 

•  View  test  results 

•  Change  some  results 


Figure  4  Entering  a  patient’s  data.  The  web-based  interface  for  the  data  entry  consists  of  five  screens.  We  show 
these  screens  by  rectangles  and  the  transitions  between  them  by  arrows. 


mammogram  is  the  best  test,  the  clinician  can 
determine  that  the  electrocardiogram  is  also  an 
effective  test,  and  order  both  tests  at  the  same 
time. 

3.4.  User  interface 

The  system  includes  a  web-based  interface  that 
allows  clinicians  to  enter  patients’  data  through 
remote  computers;  the  interface  consists  of  five 
screens  (Fig.  4). 


The  start  screen  is  for  adding  new  patients  and 
retrieving  old  patients  (Fig.  5).  After  a  user  enters  a 
patient’s  name,  the  system  displays  a  list  of  the 
available  trials  (Fig.  6).  The  user  can  choose  a 
subset  of  these  trials,  and  then  the  system  checks 
eligibility  only  for  the  selected  trials.  The  next 
screen  is  for  basic  personal  and  medical  data,  such 
as  sex,  age,  and  cancer  stage  (Fig.  7). 

After  the  system  gets  these  basic  data,  it  prompts 
the  user  for  medical  information  related  to  specific 
trials  (Fig.  8).  When  the  user  enters  medical  data. 
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Figure  6  Selecting  clinical  trials. 
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Figure  7  Entering  basic  information  for  a  patient. 
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Figure  8  Entering  medical  data. 


the  system  continuously  re-evaluates  the  patient’s 
eligibility  and  shows  the  decision  for  each  trial.  If 
the  patient  is  ineligible  for  some  trials,  the  user  can 
find  out  the  reasons  by  clicking  the  "Why”  button. 
The  interface  also  includes  a  screen  for  the  review 
and  modification  of  the  previous  answers,  similar  to 
the  screen  in  Fig.  8. 

3.5.  Experiments 

We  have  built  a  knowledge  base  for  the  breast- 
cancer  clinical  trials  at  the  Moffitt  Cancer  Center, 
applied  the  system  to  the  retrospective  data  from 
187  past  patients  and  74  current  patients,  and 
compared  the  results  with  manual  selection  by 
Moffitt  clinicians.  The  number  of  matching  trials 
for  a  patient  has  ranged  from  zero  to  three.  For 
most  patients,  the  system  rejects  most  trials  during 
the  initial  entry  of  basic  data,  such  as  sex,  age,  and 
cancer  stage.  It  usually  identifies  two  to  five  poten¬ 
tial  matches  based  on  these  basic  data,  and  narrows 
the  selection  down  to  one  or  two  trials  based  on  the 
following  trial-specific  questions. 

We  summarize  the  results  for  the  past  patients  in 
Table  1  (a),  and  the  results  for  the  current  patients 
in  Table  1(b).  The  "same  matches”  column 
includes  the  number  of  patients  who  have  been 
selected  by  both  human  clinicians  and  the  expert 
system.  The  "new  matches”  column  gives  the 
number  of  patients  who  have  been  matched  by 
the  system  but  missed  by  human  clinicians.  Finally, 
the  last  column  shows  the  number  of  matching 
patients  whose  available  records  are  incomplete. 
Clinicians  have  found  trials  for  these  patients,  but 
the  system  cannot  identify  these  matches  because 
of  insufficient  data.  Since  these  patients  are  no 
longer  at  Moffitt,  we  cannot  obtain  the  missing 
data;  note  that  this  problem  is  due  to  the  use  of 
retrospective  data,  and  it  does  not  arise  when 
clinicians  select  trials  for  new  patients. 


The  system  has  identified  a  number  of  situations 
when  patients  were  eligible  for  clinical  trials,  but 
did  not  participate  in  these  trials.  We  have  checked 
these  results  with  Moffitt  clinicians,  and  they  have 
confirmed  that  all  matches  are  correct.  In  most 
cases,  patients  did  not  participate  in  the  matching 
trials  because  clinicians  missed  these  matches; 
however,  for  some  of  the  past  cases,  we  have  been 
unable  to  verify  that  physicians  actually  missed  the 
matches,  rather  than  having  undocumented  reasons 
for  omitting  them. 

We  show  the  mean  test  costs  with  and  without 
the  ordering  heuristics  in  Table  2,  and  give  a 


e  Table  1  Results  of  matching  187  past  patients  and  74 
current  patients  ' 


Clinical 

trial 

Same 

matches 

New 

matches 

;  Missing 
data 

(a)  Results  for  the  187  past  patients 
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(b)  Results  for  the  74  current  patients 
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■  We  give  the  number  of  matches  found  by  both  the 
expert  system  and  human  cliniciansj  as  weU  as  the 
number  of  new  matches  identified  by  the  system v  We; 
also  show  the  number  of  matches  missed  by  the 
system  because  of  insufficient  data. 
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Table  2  Cost  savings  by  test  reordering 


Clinical  trial 

Average  dollar  cost 

Without  test 
reordering 

With  test 
reordering 

(a)  Results  for  the  187  past  patients 
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"  $70 

$11 

10840 

$0 

$0 

11072 

$209 

.  $60 

11378 
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,  SO  '--' 

$0 
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$0 

12101 

.  $0  ■ '  ■ 

(b)  Results  for  the  74  current  patients 
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$0 
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$314 
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$0 
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$0 

.  $0  i . 

:  12601  ; 

$64 

$6  . 

■■12775 

$0 

$0 

graphical  view  of  the  cost  savings  in  Fig.  9.  The 
results  confirm  that  the  heuristics  reduce  the  cost  of 
the  selection  process.  Five  clinical  trials  have 
incurred  selection  costs;  the  heuristics  have  signifi¬ 
cantly  reduced  the  costs  for  four  of  these  trials,  and 
have  not  affected  the  cost  for  the  fifth  trial.  The 
other  trials  have  not  incurred  costs  because  all 
related  tests  were  performed  in  the  normal  course 
of  treatment  before  the  trial  selection. 

3.6.  Scalability 

The  time  complexity  of  evaluating  the  acceptance 
and  rejection  expressions  is  linear  in  their  total  size. 
Experiments  on  a  Sun  Ultra  10  have  shown  that  the 
evaluation  takes  about  0.02  s  per  question,  and  the 
time  is  linear  in  the  number  of  questions.  Typical 
eligibility  conditions  for  a  clinical  trial  include  10- 
30  questions;  thus,  the  evaluation  time  is  0.2-0. 6  s 
per  trial. 


The  linear  scalability  is  an  advantage  over  Baye¬ 
sian  systems,  which  usually  do  not  scale  to  a  large 
number  of  clinical  trials  [14,42,43].  The  authors  of 
these  systems  have  reported  that  the  sizes  of  the 
underlying  networks  are  superlinear  in  the  number 
of  trials  [44,45],  and  that  the  training  time  is  super- 
linear  in  the  network  size  [21,22]. 

If  the  system  uses  the  cost-reduction  heuristics, 
it  converts  the  acceptance  and  rejection  expres¬ 
sions  into  disjunctive  normal  form,  which  can 
potentially  lead  to  an  explosion  in  their  size.  For 
example,  if  eligibility  conditions  are  as  shown  in 
Fig.  10(a),  the  system  initially  generates  the  expres¬ 
sion  in  Fig.  10(b).  If  the  system  converts  it  to 
disjunctive  normal  form,  the  resulting  expression 
consists  of  eight  clauses. 

Although  the  conversion  may  result  in  impracti- 
cally  large  expressions,  experiments  with  cancer 
trials  have  shown  that  this  problem  does  not  arise 
in  practice  because  the  number  of  nested  disjunc¬ 
tions  is  usually  small.  Furthermore,  we  can  elim¬ 
inate  some  disjunctions  by  combining  their 
elements  into  longer  questions.  For  instance,  we 
can  represent  Condition  3  in  Fig.  10(a)  by  a  single 
question:  "Does  the  patient  have  both  invasive  and 
recurrent  cancer?”  If  we  apply  this  modification  to 
Conditions  3  and  5,  then  we  obtain  the  expression  in 
Fig.  10(c),  and  its  conversion  to  disjunctive  normal 
form  results  in  an  expression  with  two  clauses. 


4.  Entering  eligibility  criteria 

We  describe  a  web-based  interface  for  adding  new 
clinical  trials  [46],  which  consists  of  two  main  parts; 
the  first  part  is  for  information  about  medical  tests 
(Fig.  11),  and  the  second  is  for  eligibility  criteria 
(Fig.  12).  The  interface  includes  fifteen  screens; 
three  of  them  are  "start  screens,”  which  can  be 
reached  from  any  other  screen.  We  give  an  example 
of  entering  eligibility  criteria,  describe  the  two 
parts  of  the  interface,  and  present  experiments 
to  illustrate  its  effectiveness. 
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Figure  9  Costs  with  and  without  test  reordering,  for  187  past  patients  (a)  and  74  current  patients  (b). 
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1.  The  patient  is  female. 

2.  She  is  at  most  45  years  old. 

3.  Either 

•  her  cancer  is  not  invasive,  or 

•  her  cancer  is  not  recurrent. 

4.  Either 

•  at  most  3  lymph  nodes  have  cancer, 

•  or  all  tumors  are  at  most  2.5  cm. 

5.  Either 

•  there  are  no  cardiac  arrhythmias, 

•  or  there  is  no  congenital  heart  disease. 

(a)  Eligibility  criteria. 


sex  =  FEMALE  and 

age  <  45  and 

sex  =  FEMALE  and 

(invasive  =  NO  or 

age  <  45  and 

recurrent  =  No)  and 

invasive-and-recurrent  =  NO  and 

(lymph-nodes  <  3  or 

(lymph-nodes  <  3  or 

tumor-size  <  2.5)  and 

tumor-size  <  2.5)  and 

(airhythmias  =  NO  or 

arrhythmias- and- congenital  =  NO 

congenital  =  NO) 

(b)  Acceptance  expression. 

(c)  Reduced  expression. 

Figure  10  Reducing  the  number  of  disjunctions.  The  conversion  of  the  eligibility  criteria  (a)  into  a  logical  expression 
(b)  leads  to  an  explosion  in  the  size  of  the  corresponding  disjunctive  normal  form.  We  may  prevent  the  explosion  by 
replacing  some  disjunctions  with  single  questions  (c). 


4.1.  Example 

Suppose  that  a  user  needs  to  enter  the  criteria 
shown  in  Fig.  1,  and  the  system  initially  has  no  data 
about  the  related  tests.  The  user  has  to  describe  the 
tests  and  questions,  and  specify  the  eligibility  con¬ 
ditions. 

First,  the  user  utilizes  the '  'Adding  tests”  screen 
to  enter  the  new  tests;  we  illustrate  the  entry  of  a 
test  in  Fig.  13.  Then,  the  user  adds  the  related 
questions;  to  enter  questions  for  a  specific  test. 


the  user  selects  the  test  and  clicks  "Modify” 
(Fig.  14),  and  the  system  displays  the  "Modifying 
a  tesf’screen  (Fig.  15).  To  add  a  question,  the  user 
clicks  the  appropriate  button  at  the  bottom  (Fig.  15) 
and  then  types  the  question  (Fig.  16). 

After  adding  the  questions  for  all  tests,  the  user 
goes  to  the  "Adding  clinical  trials”  screen  and 
initializes  a  new  trial  (Fig.  17).  The  user  gets  the 
"Selecting  tests”  screen  and  chooses  the  tests 
related  to  the  current  trial  (Fig.  18).  Then,  the  user 
marks  relevant  questions  and  the  answers  that  make 


(b) 


Figure  11  Entering  tests  and  questions  (a),  and  general  questions  (b).  We  show  the  screens  by  rectangles  and  the 
transitions  between  them  by  arrows;  the  bold  rectangles  are  the  start  screens. 
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Figure  12  Entering  eligibility  criteria. 


a  patient  eligible  (Fig.  19).  If  the  eligibility  criteria 
include  disjunctions,  the  user  has  to  utilize  the 
screen  for  composing  logical  expressions  (Fig.  20). 

4.2.  Tests  and  questions 

We  now  describe  the  six-screen  interface  for  adding 
tests  and  questions  (Fig.  11a).  The  start  screen 
allows  viewing  the  available  tests  and  defining 
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Figure  14  Selecting  a  test  for  entering  the  related  questions. 
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Figure  16  Adding  yes/no  questions  (a)  and  multiple-choice  questions  (b);  the  user  types  a  question  and  the  answer 
options. 


Figure  17  Adding  a  new  clinical  trial, 
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Figure  18  Choosing  tests  and  question  types. 
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Figure  1 9  Selecting  questions  and  answers.  The  user  checks  the  questions  for  the  current  clinical  trial  and  marks  the 
answers  that  satisfy  the  eligibility  criteria. 
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new  ones,  whereas  the  other  screens  are  for  mod¬ 
ifying  tests  and  adding  questions. 

We  show  the  start  screen  in  Fig.  13;  its  left-hand 
side  allows  viewing  questions  and  going  to  a  mod¬ 
ification  screen.  If  the  user  selects  a  test  and  clicks 
"View,”  the  system  shows  the  questions  related  to 
this  test  at  the  bottom  of  the  same  screen.  If  the 
user  clicks  "Modify,”  it  displays  the  "Modifyins  a 
test”  screen  (Fig.  15).  The  right-hand  side  of  the 
start  screen  allows  adding  a  new  test  by  specifying 
its  name,  cost,  and  pain  level. 

The  "Modifying  a  test”  screen  shows  the  infor¬ 
mation  about  a  specific  test,  which  includes  the  test 
name,  cost,  pain  level,  and  related  questions.  The 
user  can  change  the  test  name,  cost,  and  pain  level; 
the  four  bottom  buttons  allow  moving  to  the  screens 
for  adding  and  deleting  questions. 

We  show  the  screen  for  adding  yes/no  questions  in 
Fig.  16(a)  and  multiple-choice  questions  in  Fig.  16(b); 
the  screen  for  numeric  questions  is  similar.  The  user 
can  enter  a  new  question  for  the  current  test,  along 
with  a  set  of  allowed  answers.  If  the  question  is  also 
related  to  other  tests,  the  user  has  to  mark  them  in  the 
lower  box.  The  "Deleting  questions”  screen  is  for 
removing  old  questions,  which  allows  modification  of 
old  eligibility  criteria. 

The  mechanism  for  adding  general  questions, 
such  as  sex  and  age,  consists  of  five  screens 
(Fig.  11b),  and  the  user  adds  general  questions  in 
the  same  way  as  test-related  questions. 

4.3.  Eligibility  conditions 

We  next  describe  the  mechanism  for  entering  elig¬ 
ibility  criteria,  which  consists  of  four  screens 
(Fig.  12).  The  start  screen  allows  the  user  to  initi¬ 
alize  a  new  clinical  trial  and  view  the  criteria  for  old 
trials.  If  the  user  needs  to  modify  a  clinical  trial,  the 
system  first  displays  the  test-selection  screen 
(Fig.  18).  The  user  then  chooses  related  tests  and 


question  types,  and  clicks  "Continue”  to  get  the 
question  list. 

The  next  screen  (Fig.  19)  allows  the  user  to  select 
specific  questions  and  mark  answers  that  make  a 
patient  eligible.  For  a  multiple-choice  question,  the 
user  may  specify  several  eligibility  options;  for 
example,  a  patient  may  be  eligible  if  her  cancer 
stage  is  II  or  III.  For  a  numeric  question,  the  user  has 
to  specify  a  range  of  values;  for  instance,  a  patient 
may  be  eligible  if  her  age  is  between  0  and  45  years. 
If  the  user  clicks  "Simple  questions,”  the  system 
generates  a  conjunction  of  the  selected  criteria.  If 
the  eligibility  conditions  involve  a  more  complex 
expression,  the  user  has  to  click  "Combined  ques¬ 
tion’  ’  and  then  use  the  screen  for  composing  logical 
expressions  (Fig.  20). 

The  system  combines  the  eligibility  criteria  into 
an  acceptance  expression,  and  then  generates  the 
corresponding  rejection  expression  by  recursive 
application  of  DeMorgan’s  laws.  If  the  system  uses 
the  cost-reduction  heuristics,  it  converts  these 
expressions  into  disjunctive  normal  form  using  a 
standard  conversion  algorithm  [47,48]. 

4.4.  Entry  time 

We  have  run  experiments  with  16  novice  users,  who 
had  no  prior  experience  with  the  interface.  First, 
every  user  has  entered  four  sets  of  medical  tests; 
each  set  has  included  three  tests  and  10  questions. 
Then,  each  user  has  added  eligibility  expressions  for 
10  clinical  trials  used  at  the  Moffitt  Cancer  Center; 
the  number  of  questions  in  an  eligibility  expression 
has  varied  from  10  to  35. 

We  have  measured  the  entry  time  for  each  test 
set  and  each  clinical  trial.  In  Fig.  21 ,  we  show  the 
mean  time  for  every  test  set  and  the  time  per 
question  for  the  same  sets.  All  users  have  entered 
the  test  sets  in  the  same  order;  since  they  had  no 
prior  experience,  their  performance  has  improved 


Figure  21  Entry  time  for  test  sets  (left)  and  the  mean  time  per  question  for  each  set  (right).  We  plot  the  average  time 
(dashed  lines)  and  the  time  of  the  fastest  and  slowest  users  (vertical  bars). 
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Figure  22  Entry  time  for  eligibility  criteria.  We  show  the  average  time  for  each  clinical  trial  and  the  time  per 
question  (dashed  lines),  along  with  the  performance  of  the  fastest  and  slowest  users  (vertical  bars). 


during  the  experiment.  In  Fig.  22,  we  give  similar 
graphs  for  the  entry  of  trials. 

The  experiments  have  shown  that  novices  can 
efficiently  use  the  interface;  they  quickly  learn  its 
full  functionality,  and  their  learning  curve  reaches  a 
plateau  after  about  an  hour.  The  average  time  per 
question  is  31  s  for  the  entry  of  medical  tests  and 
37  s  for  eligibility  criteria,  which  means  that  a  user 
can  enter  all  150  cancer  trials  used  at  Moffitt  in 
about  2  weeks. 


5.  Conclusions 

We  have  developed  an  expert  system  that  assigns 
cancer  patients  to  clinical  trials.  We  have  described 
the  representation  of  selection  criteria,  heuristics 
for  ordering  of  tests,  and  a  web-based  interface  for 
entering  patients’  data,  which  will  enable  physi¬ 
cians  across  the  country  to  access  a  central  repo¬ 
sitory  of  clinical  trials.  The  system  also  includes  an 
interface  for  extending  its  knowledge  base,  which 
allows  a  user  to  enter  a  new  trial  in  10— 20  min. 
Novices  can  use  the  interface  without  prior  instruc¬ 
tions,  and  they  reach  their  full  speed  after  about  an 
hour.  Although  cancer  research  has  provided  the 
motivation  for  this  work,  the  system  is  not  limited 
to  cancer,  and  we  can  use  it  for  trials  related  to 
other  diseases. 

The  system  uses  logical  eligibility  expressions, 
similar  to  those  in  eon  [13]  and  oncodoc  [15,16];  this 
approach  is  different  from  aids^  [14]  and  Theochar- 
ous’s  system  [21,22],  which  are  based  on  Bayesian 
networks.  The  use  of  logical  expressions  ensures 
scalability  and  ease  of  adding  new  trials,  but  it  does 
not  allow  probabilistic  decisions  based  on  incom¬ 
plete  data. 

We  have  applied  the  system  to  the  data  from  261 
breast-cancer  patients  admitted  to  the  Moffitt  Can¬ 
cer  Center  in  the  last  3  years.  The  experiments  have 
confirmed  that  the  system  can  improve  the  speed 


and  accuracy  of  selecting  trial  participants.  The 
results  suggest  that  physicians  miss  about  60%  of 
matching  trials,  which  means  that  the  system  can 
increase  the  number  of  participants  by  a  factor  of 
2.5.  These  results  are  consistent  with  the  studies  of 
the  manual  trial  selection  [3-8],  which  confirm  that 
clinicians  miss  up  to  60%  of  matches.  They  are  also 
consistent  with  the  experiment  on  using  oncodoc  at 
two  French  hospitals,  which  has  increased  the  num¬ 
ber  of  selected  matches  by  a  factor  of  3  [17,1 8].  We 
have  been  unable  to  compare  the  results  with  those 
of  AiDs^,  EON,  and  Theocharous’s  system,  because  the 
authors  of  these  systems  have  not  reported  large- 
scale  clinical  experiments. 

The  developed  system  includes  heuristics  for  the 
ordering  of  medical  tests,  which  is  an  advantage 
over  the  other  trial-selection  systems.  The  experi¬ 
ments  have  shown  that  the  ordering  of  tests  affects 
their  overall  cost,  and  the  implemented  heuristics 
reduce  this  cost. 

We  now  point  out  some  limitations  of  the  devel¬ 
oped  system  and  related  future  challenges.  First, 
the  system  does  not  access  patient  data  in  the 
Moffitt  clinical  database,  and  nurses  have  to  enter 
all  relevant  information  through  the  system’s 
interface.  The  data  in  the  clinical  database  are 
mostly  in  natural  language,  as  dictated  by  physi¬ 
cians;  we  plan  to  develop  a  mechanism  for  trans¬ 
ferring  these  data  into  the  trial-selection  system, 
which  will  require  domain-specific  tools  for  nat¬ 
ural-language  processing.  Second,  the  system 
does  not  keep  track  of  temporal  changes  in  the 
data;  for  example,  it  does  not  update  a  patient’s 
age,  and  does  not  flag  the  out-of-date  test  results. 
We  have  recently  designed  a  mechanism  for  tem¬ 
poral  reasoning,  and  we  plan  to  integrate  it  with 
the  system.  Third,  the  test-ordering  heuristics  do 
not  account  for  the  probabilities  of  possible  test 
results,  and  we  are  presently  working  on  the  inte¬ 
gration  of  the  current  heuristics  with  probabilistic 
reasoning. 
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Abstract  -  When  clinicians  test  a  new  treatment  pro¬ 
cedure,  they  need  to  identify  and  recruit  patients  mth 
appropriate  medical  conditions.  We  have  developed  an 
expert  system  that  kelps  clinicians  select  patients  for  ex- 
perimentol  treatments,  and  to  reduce  the  number  and 
overall  cost  of  related  medical  tests.  We  describe  exper¬ 
iments  on  selecting  patients  for  new  treatments  at  the 
Moffitt  Cancer  Center.  The  experiments  have  shown 
that  the  system  can  increase  the  number  of  selected  pa¬ 
tients  by  a  factor  of  three,  and  that  it  can  also  reduce 
the  cost  of  the  selection  process. 

Keywords:  Medical  expert  systems,  breast  cancer, 
cost  reduction. 

1  Introduction 

When  clinicians  conduct  treatment  experiments, 
called  clinical  trials,  they  have  to  recruit  participants 
from  current  patients.  To  select  prospective  partici¬ 
pants,  clinicians  analyze  the  data  of  available  patients 
and  identify  patients  with  appropriate  medical  condi¬ 
tions.  This  analysis  has  traditionaUy  been  a  manual 
process,  and  studies  have  shown  that  clinicians  miss  up 
to  60%  of  the  matching  patients,  which  delays  the  com¬ 
pletion  of  clinical  trials  [7,  17]. 

To  address  this  problem,  several  researchers  built  ex¬ 
pert  systems  to  help  clinicians  select  trial  participamts. 
Ohno-Machado  et  al  developed  the  AIDS^  system,  which 
selected  AIDS  patients  for  clinical  trials  [llj.  Musen  et 
oL  built  a  rule-based  system,  called  eon,  that  also  se¬ 
lected  AIDS  trial  participants  [8].  Bouaud  ef  al.  created 
the  ON  CODOC  system,  which  suggested  trials  for  cancer 
patients  [2,  3].  Seroussi  et  al  used  ONCODOC  to  iden¬ 
tify  trial  participants  at  two  hospitals,  which  helped  in- 
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crease  the  number  of  selected  patients  by  a  factor  of 
three  [13,  14,  15]. 

The  National  Cancer  Institute  created  a  search  en¬ 
gine  for  selecting  clinical  trials,  available  through  the 
Internet  at  www.cancer.gov/searck/clinicalArials.  It 
prompts  a  user  to  answer  several  questions  about  a  pa¬ 
tient,  and  gives  a  list  of  potentially  matching  trials;  how¬ 
ever,  it  does  not  determine  whether  the  patient  satisfies 
all  of  the  requirements  of  these  trials. 

Fallowfield  et  al  studied  how  physicians  selected  can¬ 
cer  patients  for  clinical  trials,  and  compared  manual  and 
automated  selection  [5].  They  showed  that  expert  sys¬ 
tems  could  improve  the  selection  accuracy,  but  physi¬ 
cians  were  reluctant  to  use  these  systems.  Carlson  et 
al.  conducted  similar  studies  with  aids  trials,  and  also 
concluded  that  expert  systems  could  lead  to  a  more  ac¬ 
curate  selection  [4]. 

A  recent  project  at  the  University  of  South  Florida 
has  also  been  aimed  at  automated  identification  of 
prospective  trial  participants.  Theocharous  developed 
a  Bayesian  system  that  selected  clinical  trials  for  cancer 
patients  [12,  16],  and  Bbanja  et  al.  built  a  qualitative 
rule-based  system  for  the  same  task  [1]. 

We  have  continued  their  work,  built  a  new  version 
of  the  rule-based  s>'stem  [6,  9,  10],  and  applied  it  to 
selecting  patients  for  breast-cancer  trials  at  the  Moffitt 
Cancer  Center,  located  on  campus  of  the  University  of 
South  Florida.  We  outline  the  design  of  this  system  and 
present  an  empirical  evaluation  of  its  effectiveness. 

2  Knowledge  base 

Physicians  at  the  Moffitt  Cancer  Center  currently 
conduct  about  150  clinical  trials.  We  have  developed  ^ 
expert  system  to  help  physicians  select  trials  for  eligible 
patients;  it  consists  of  a  knowledge  base  and  a  web- 
based  interface  for  entering  patient  data.  The  knowl¬ 
edge  base  contains  information  about  related  medical 


4541 


(a)  MEDICAL  TESTS 

General  information 
What  is  the  patient’s  sex? 

What  is  the  patient’s  age? 

Mammogram,  Cost  is  $150 
What  is  the  cancer  stage? 

Does  the  patient  have  invasive  cancer? 

Biopsy,  Cost  is  $400 

How  many  lymph  nodes  have  tumor  cells? 

What  is  the  greatest  tumor  diameter? 

Electrocardiogram,  Cost  is  $200 

Does  the  patient  have  cardiac  arrhythmias? 

(b)  ELIGIBILITY  CRITERIA 

sex  =  FEMALE  and 
age  <  45  and 
cancer- stage  €  {u,  in}  and 
invasive- cancer  =  NO  and 
lymph-nodes  <  3  and 
{arrhythmias  =  NO  or 
tumor-diameter  <  2.5) 

Figure  1:  Description  of  medical  tests  and  trial- 
eligibility  criteria  in  the  trial-selection  system. 

tests,  as  well  as  logical  expressions  that  determine  a  pa¬ 
tient’s  eligibility  for  each  trial.  The  description  of  a 
medical  test  includes  its  dollar  cost  and  list  of  questions 
that  can  be  answered  based  on  the  test  results  (Fig¬ 
ure  la).  The  trial-eligibility  criteria  are  represented  by 
a  logical  expression,  which  includes  variables  that  repre¬ 
sent  the  patient  data,  as  well  as  equalities,  inequalities, 
“set-element”  relations,  conjunctions,  and  disjunctions 
(Figure  lb). 

The  system  collects  data  until  it .  can  determine 
whether  the  eligibility  expression  is  TRUE  or  FALSE.  For 
example,  if  a  clinician  uses  the  system  to  determine  a  pa¬ 
tient’s  eligibility  for  the  trial  in  Figure  1(b),  it  first  asks 
about  the  patient’s  sex  and  age.  If  the  patient  satisfies 
the  corresponding  conditions,  it  asks  for  the  mammo¬ 
gram  results,  and  then  requests  the  biopsy  and  electro¬ 
cardiogram  data.  The  ordering  of  tests  depends  on  their 
costs  and  on  the  amoimt  of  information  provided  by  test 
results.  The  system  begins  with  the  mammogram  be¬ 
cause  it  is  cheaper  than  the  other  tests  and  provides 
data  for  two  clauses  of  the  eligibility  expression. 

3  Selection  of  participants 

We  have  built  a  knowledge  base  for  the  breast-cancer 
trials  at  the  Moffitt  Cancer  Center,  including  five  com¬ 
pleted  trials  and  ten  current  trials,  and  applied  the  sys¬ 
tem  to  retrospective  data  from  the  Moffitt  patients  who 
have  had  a  breast-cancer  surgery  in  the  last  three  years. 
We  have  discarded  the  patients  whose  available  records 


are  incomplete,  and  used  all  remaining  patients,  which 
include  187  past  patients  and  169  current  patients. 

We  have  compared  the  results  of  automated  trial  se¬ 
lection  for  these  patients  with  the  manual  selection  by 
Moffitt  clinicians.  The  system  has  identified  all  eligible 
patients  for  each  trial,  whereas  the  clinicians  have  se¬ 
lected  about  one-third  of  the  eligible  patients.  We  sum¬ 
marize  the  results  for  the  past  patients  in  Ihble  1(a), 
and  the  results  for  the  current  patients  in  Table  1(b). 
The  ‘^participants”  column  shows  the  number  of  actual 
participants  of  each  trial:  the  “other  eligible”  column 
gives  the  number  of  the  other  eligible  patients  identified 
by  the  sj^tem. 

For  every  current  patient  who  did  not  participate  in 
a  matching  trial,  we  have  checked  whether  she  partic¬ 
ipated  in  any  other  trial,  and  we  show  the  results  in 
Table  2.  We  have  not  done  a  similar  analysis  for  the 
past  patients  due  to  insufficient  data.  The  “incompati¬ 
ble”  column  in  Table  2  includes  the  number  of  eligible 
patients  who  did  not  participate  in  a  specified  trial  be¬ 
cause  of  participation  in  another  incompatible  trial.  The 
“compatible”  column  shows  the  number  of  patients  who 
participated  in  another  compatible  trial,  and  could  also 
have  participated  in  the  specified  trial.  Finally,  the  “no 
other  trial”  column  gives  the  number  of  eligible  patients 
who  have  not  participated  in  any  trial. 

The  results  show  that  the  system  can  identify  eligi¬ 
ble  patients  who  have  not  been  selected  by  clinicians; 
thus,  it  can  increase  the  number  of  trial  participants. 
For  the  patients  in  the  reported  experiments,  it  could 
increase  the  overall  number  of  participants  by  a  factor 
of  three.  In  particular,  it  has  found  prospective  par¬ 
ticipants  for  some  trials  with  a  very  small  number  of 
manually  selected  patients.  For  example,  it  has  found 
nineteen  matching  patients  for  trial  12385,  which  cur¬ 
rently  has  no  participants,  and  twenty-six  patients  for 
trial  11931,  which  has  only  two  participants. 

4  Cost  reduction 

If  the  available  patient  records  do  not  provide  enough 
data  for  trial  selection,  clinicians  perform  medical  tests 
as  part  of  the  selection  process.  They  can  reduce  the 
overall  test  cost  by  first  ordering  inexpensive  tests,  and 
then  using  their  results  to  avoid  some  expensive  tests. 

The  system  suggests  the  ordering  of  tests  that  reduces 
their  expected  cost.  After  getting  the  results  of  the  first 
test,  it  re-evaluates  the  need  for  the  other  tests  and  re¬ 
vises  their  ordering.  The  choice  of  the  first  test  is  based 
on  three  criteria.  The  system  scores  all  required  tests 
according  to  these  criteria,  computes  a  linear  combina¬ 
tion  of  the  three  scores  for  every  test,  and  chooses  the 
test  with  the  highest  score. 
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T^ble  1:  Results  of  selecting  clinical  trials  for  the  187 
past  patients  and  169  current  patients.  We  give  the 
number  of  trial  participants,  selected  by  both  the  sys¬ 
tem  and  Moffitt  clinicians,  and  the  number  of  the  other 
eligible  patients,  identified  by  the  system. 

(a)  Results  for  the  187  past  patients. 
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(b)  Results  for  the  169  current  patients. 
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Tbble  3:  Cost  savings  by  test  reordering. 


(a)  Results  for  the  187  past  patients. 
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(b)  Results  for  the  169  current  patients. 
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Tkble  2:  Participation  of  the  patients  who  skipped  a 
matching  clinical  trial  in  other  trials.  We  show  the 
number  of  patients  who  skipped  the  trial  because  of  par- 
ticipation  in  another  incompatible  trial;  the  number  of 
patients  who  were  on  another  trial  compatible  with  the 
skipped  trial;  and  the  number  of  eligible  patients  who 
were  not  on  any  trial. 
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(a)  Results  for  the  187  past  patients. 
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(b)  Results  for  the  169  current  patients. 

Figure  2:  Costs  with  and  without  test  reordering.  We 
plot  the  results  for  the  six  clinical  trials  that  have  in¬ 
curred  nonzero  selection  costs. 
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1.  Cost  of  a  test.  The  system  gives  preference  to  less 
expensive  tests. 

2.  Immediate  decision.  If  a  test  can  lead  to  an  imme¬ 
diate  acceptance  or  rejection  of  the  trial,  the  system 
prefers  it  to  other  tests. 

3.  Number  of  related  clauses.  The  system  prefers  the 
tests  that  provide  data  for  large  number  of  clauses 
in  the  eligibility  expression. 

The  system  disregards  the  costs  of  tests  performed  in 
the  normal  course  of  treatment,  and  accounts  only  for 
the  costs  related  to  the  trial  selection.  For  example,  if 
a  patient  needs  a  mammogram  regardless  of  trial  par¬ 
ticipation,  the  system  views  it  as  a  zero-cost  test.  On 
the  other  hand,  if  the  only  purpose  of  the  biopsy  and 
electrocardiogram  is  to  select  trials,  the  system  uses  the 
heuristics  to  order  these  tests. 

We  show  the  mean  test  costs  with  and  without  the 
ordering  heuristics  in  Table  3,  and  give  a  graphical  view 
of  the  cost  savings  in  Figure  2.  The  results  confirm 
that  the  heuristics  reduce  the  cost  of  the  selection  pro¬ 
cess.  Six  clinical  trials  have  incurred  selection  costs;  the 
heuristics  have  reduced  the  costs  for  four  of  these  trials, 
and  have  not  affected  the  costs  for  the  other  two  trials. 

The  results  in  Table  3(a)  differ  from  similar  experi¬ 
ments  with  an  earlier  version  of  the  system  [6],  because 
of  two  changes  to  the  system.  First,  the  current  version 
disregards  the  costs  of  the  tests  required  for  the  reg¬ 
ular  treatment,  which  do  not  affect  the  trial-selection 
expenses,  whereas  the  earlier  version  counted  all  costs. 
Second,  some  costs  in  the  old  system  were  out-of-date, 
and  we  have  corrected  them  based  on  the  data  from  the 
Moffitt  accounting  department. 

5  Reduction  of  data  entry 

The  system  tries  to  minimize  not  only  the  overall  cost 
of  medical  tests,  but  also  the  amount  of  data  entry,  that 
is,  the  number  of  questions  asked  about  a  patient.  For 
each  question,  it  estimates  the  probability  that  the  2m- 
swer  will  lead  to  an  immediate  acceptance  or  rejection 
of  the  trial,  and  it  gives  preference  to  the  questions  with 
the  highest  probability  of  an  immediate  decision.  Thus, 
when  a  clinician  enters  the  available  data,  the  system 
asks  the  related  questions  in  the  decreasing  order  of 
the  immediate-decision  probabilities.  It  estimates  these 
probabilities  from  past  experience  with  other  patients. 
For  each  question,  it  determines  the  percentage  of  past 
answers  that  have  led  to  immediate  decisions,  and  uses 
this  percentage  as  the  probability  estimate. 

We  have  evaluated  the  effectiveness  of  this  ordering 
heuristic  for  six  clinical  trials,  using  the  data  from  the 
169  current  patients.  We  have  performed  ten-fold  cross- 
validation;  that  is,  we  have  used  90%  of  the  patients  to 
compute  the  related  probabilities,  and  then  measured 
the  mean  number  of  questions  for  the  other  10%. 

We  show  the  results  with  and  without  the  ordering 
heuristic  in  Table  4,  and  give  a  graphical  view  of  the 


Table  4:  Reduction  of  data  entry  by  the  reordering  of 
questions,  for  the  169  current  patients. 
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Figure  3:  Number  of  questions  with  and  without  the 
reordering  heuristic,  for  the  169  current  patients. 


same  results  in  Figure  3.  The  heuristic  has  reduced 
the  number  of  questions  for  all  six  trials;  the  reduction 
ranges  from  1%  to  29%,  and  its  mean  is  15%.  The  re¬ 
sults  confirm  that  the  accumulated  statistical  data  help 
reduce  the  number  of  questions. 

6  Concluding  remarks 

We  have  developed  an  expert  system  that  selects 
clinical  trials  for  eligible  patients.  Experiments  have 
confirmed  that  the  system  C£m  increase  the  number  of 
clinical-trial  participants.  They  have  also  shown  that 
the  ordering  of  related  medical  tests  affects  the  over¬ 
all  test  cost,  and  the  implemented  heuristics  can  reduce 
this  cost. 
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Abatraet — The  purpose  of  a  clinical  trial  is  to  eval¬ 
uate  a  new  treatment  procedure.  When  medical  re¬ 
searchers  conduct  a  trial,  they  recruit  participants 
with  appropriate  medical  histories.  To  select  par¬ 
ticipants,  the  researchers  analyze  medical  records  of 
the  available  patients,  which  has  traditioiially  been  a 
manual  proccKiure.  We  describe  an  intelli^nt  agent 
that  helps  to  select  patients  for  clinical  trials.  If  the 
available  data  are  insufficient  for  choosing  patients, 
the  agent  suggests  additional  medical  tests  and  finds 
an  ordering  of  the  tests  that  reduces  their  total  cost. 

Kcyvjords — Medical  expert  systems,  automated  di¬ 
agnosis,  clinical  trials. 

L  Introduction 

A  clinical  trial  is  an  experiment  with  a  new  treat¬ 
ment  procedure.  When  medical  researchers  test  a  new 
treatment,  they  recruit  patients  with  appropriate  health 
problems  and  medical  histories.  The  selection  of  pa¬ 
tients  has  traditionally  been  a  manual  procedure,  and 
recent  studies  have  shown  that  clinicians  can  miss  up  to 
60%  of  the  eligible  patients  [9,  10,  14,  26,  35,  38]. 

If  the  available  records  do  not  provide  enou^  data, 
clinicians  perform  medical  tests  as  part  of  the  selection 
process.  The  costs  of  most  tests  have  declined  over  the 
last  decade,  but  the  number  of  tests  has  significantly  in¬ 
creased  [33,  36],  which  is  partially  due  to  inappropriate 
ordering  of  tests  [1,  25).  Clinicians  can  reduce  the  cost 
by  first  requiring  incxpcnsh’c  tests  and  then  using  their 
results  to  avoid  some  expensive  tests;  however,  finding 
the  right  ordering  may  be  a  complex  problem. 

The  purpose  of  the  described  work  is  to  automate  the 
selection  of  patients  for  clinical  trials  and  minimize  the 
cost  of  related  tests.  We  have  developed  an  agent  that 
identifies  appropriate  trials  for  each  patient,  and  built  a 
knowledge  base  for  breast-can ccr  trials. 

II.  Previous  Work 

Researchers  began  to  work  on  medical  expert  systems 
in  the  early  seventies.  Shortliffe  et  aL  developed  the 
MVCJN  system,  which  diagnosed  bacterial  diseases  (5, 30, 
31].  Its  knowledge  base  consisted  of  if-then  rules,  which 
allowed  for  the  analysis  of  symptoms  and  evaluation  of 
the  certainty  of  the  diagnosis.  Exp>crimcnts  showed  that 
MYCiN  correctly  diagnosed  common  diseases,  which  led 
to  the  development  of  other  medical  systems  [5, 19],  such 
as  NEOMYCIN,  PUFF,  CENTAUR,  and  VM.  Shortliffe  et  al. 
created  a  system  for  selecting  chemotherapy  treatments, 
called  ONCOCIN  [32],  which  also  evolved  from  MYCIN. 

Lucas  et  al.  constructed  a  rule-based  system  for  diag¬ 
nosing  liver  and  biliary-tract  diseases  [16],  but  it  often 


gave  an  incorrect  diagnosis  [12,  23].  Korver  and  Lucas 
converted  the  initial  system  into  a  Bayesian  network, 
which  improved  its  performance  [13, 15]. 

Musen  et  oL  built  a  rule-based  system,  called  eon, 
that  selected  aids  patents  for  dinical  trials  [20].  Ohno- 
Machado  et  al  develc^>ed  the  AiDS^sy  stem,  which  also 
assigned  aids  patients  to  clinical  trials  [21] .  They  in¬ 
tegrated  logical  rules  with  Bayesian  networks,  which 
helped  to  make  decisions  in  the  absence  of  some  data. 

Bouaud  et  al.  created  a  cancer  expert  system,  called 
ONCODOC;  that  suggested  alternative  clinical  trials  for 
each  patient  and  allowed  a  physician  to  choose  among 
them  [3,  4).  Seroussi  et  aL  used  ONCOOOC  to  select 
participants  for  clinical  trials  at  two  hospitals,  whidi 
helped  to  increase  the  number  of  selected  patients  by  a 
factor  of  three  [27,  28,  29]. 

Hammond  and  Sergot  created  the  OaSiS  architec¬ 
ture  [11],  which  combined  the  tedmiques  from  earlier 
systems,  including  eon  and  ONCOCIN.  Smith  et  al.  built 
a  system  that  assisted  a  clinician  in  selecting  medical 
tests  and  reducing  their  number  and  cost  [17, 18,  33]- 

Pallowfield  et  al.  studied  how  physicians  selected  can¬ 
cer  patients  for  clinical  trials,  and  compared  manual 
and  automatic  selection  [8].  They  showi^  that  expert 
systems  could  improve  the  selection  accuracy;  however, 
their  study  also  revealed  that  physicians  were  reluctant 
to  use  these  systems.  Carlson  ei  al.  conducted  similar 
studies  with  AIDS  trials,  and  also  concluded  that  expert 
systems  could  lead  to  a  more  accurate  selectbn  [6j. 

Theocharous  developed  a  Bayesian  system  that  se¬ 
lected  clinical  trials  for  cancer  patients  [24,  34).  It 
learned  condiUonal  probabilities  of  medical-test  out¬ 
comes  and  evaluated  the  probability  of  a  patient’s  eligi¬ 
bility  for  each  trial.  On  the  negative  side,  the  available 
medical  records  were  often  insufficient  for  learning  ac¬ 
curate  probabilities.  Rirthermorc,  when  adding  a  new 
clinical  trial,  the  user  had  to  change  the  structure  of  the 
underlying  Bayesian  network. 

To  address  these  problems,  Bhanja  et  al.  built  a  rule- 
based  system  for  the  same  task  [2),  We  have  continued 
that  work,  extended  the  system,  and  added  a  media- 
nism  for  reducing  costs  involved  in  patient  selection. 

III.  Example 

Wc  have  developed  an  intelligent  agent  that  helps  to 
select  clinical  trials  for  cligibie  patients.  It  prompts  a 
clinician  to  enter  the  results  of  medical  tests,  and  iden¬ 
tifies  appropriate  trials.  If  the  available  records  do  not 
provide  enough  data,  the  agent  suggests  additional  tests. 

In  Figure  1(a),  we  ^ve  a  simplified  example  of  eligibil- 


(a)  Eligibility  criteria 

1.  The  patient  is  female. 

2.  She  is  at  most  forty-five  years  old. 

3.  Her  cancer  stage  is  il  or  tiL 

4.  Her  cancer  is  not  invasive. 

5.  At  most  three  lymph  nodes  have  tumor  cells. 

6.  Either 

•  the  patient  has  no  cardiac  arrhythmias,  or 

•  all  tumors  are  smaller  than  2.5  centimeters. 

(b)  Tests  and  questions 

General  information 
What  is  the  patient's  sex? 

What  is  the  patient’s  age? 

Mammoffram,  Cost  is  $150 
What  is  the  cancer  stage? 

E>oes  the  patient  have  invasive  cancer? 

Biopsy,  Cost  is  $300 
What  is  the  cancer  stage? 

How  many  lymph  nodes  have  tumor  cells? 

What  is  the  greatest  tumor  size? 

Electrocardiogram,  Cost  is  $200 

Docs  the  patient  have  cardiac  arrhythmias? 

Fig.  1.  Example  of  eligibility  criteria,  tests,  and  questions. 


(a)  Acceptance 

sex  =  FEMALE  and 
age  <  45  and 
stage  €  {il,  111}  and 
invasive  ss.  no  and 
lymph-nodes  <  3  and 
(arrhythmios  =  NO  or 
txtrnor-Bizc  <  2.5) 


(b)  Rejection 
sex  =  MALE  or 
age  >  45  or 
cancer  e  {l,  iv}  or 
inuosivc  =  YES  or 
iymph-Tuxfcs  >  3  or 
{arrhythmias  =  YES  and 
tumor-size  >  2.5) 


Fig.  2.  Logical  expressions  for  the  criteria  in  Figure  1(a), 


ity  criteria  for  a  clinical  trial.  This  trial  is  for  young  and 
middlo^agcd  women  with  a  noninvasivc  cancer  at  stage 
II  or  III.  When  testing  a  patient’s  cligibilityt  a  clinician 
has  to  order  three  medical  tests  (Figure  lb). 

The  agent  first  prompts  a  clinician  to  enter  the  pa¬ 
tient’s  sex  and  age.  If  the  patient  satisfies  the  corre¬ 
sponding  conditions,  the  agent  asks  for  the  mammo¬ 
gram  results  and  verifies  Conditions  3  and  4;  then,  it 
requests  the  biopsy  and  electiocardiogtaiii  data.  If  the 
patient’s  records  already  include  some  test  results,  the 
clinician  can  answer  the  corresponding  questions  wMlc 
entering  the  personal  data,  before  the  agent  selects  lest 
procedures.  For  example,  if  the  records  indicate  that 
the  cancer  stage  is  iv,  the  clinician  can  enter  the  stage 
along  with  sex  and  age,  and  then  the  agent  immediately 
determines  that  the  patient  is  ineligible  for  this  trial. 

IV.  Knowledge  Base 

The  agent’s  knowledge  base  includes  questions,  med¬ 
ical  tests,  and  logical  expressions  that  represent  eligibil¬ 
ity  criteria  for  each  trial.  We  give  a  simplified  example 
of  tests  and  questions  in  Figure  1(b),  and  lo^cal  expres¬ 
sions  in  Figure  2. 


sex  ss  FEMALE  and 
age  <  45  and 
stage  €  {ll,  Hi)  and 
tnuostt^e  =  NO  and 


/sex  —  FEMALE  and 
age  <  45  and 
stage  6  {iL  111}  and 
invasive  =  NO  and 


\ 


I  lymph-nodes  <  3  and  1 
\  arrhythmias  =  NO  / 


I  lymph-nodes  <  3  and  J 
\  tumor-size  <2.5  / 


Fig.  3.  Disjunctive  normal  form  of  the  acceptance  expression. 


The  agent  supports  three  types  of  questions*,  the  first 
type  takes  a  yes/no  response,  the  second  is  multiple 
choice,  and  the  third  requites  a  numeric  answer.  For 
example,  the  cancer  stage  ia  a  multiple-choice  question, 
and  the  tumor  size  is  a  numeric  question.  The  descrip¬ 
tion  of  a  medical  test  includes  the  test  name,  dollar  cost, 
and  list  of  questions  that  can  be  answered  based  on  the 
test  results  (Figure  1). 

We  encode  the  eligibility  for  a  clinical  trial  by  s  log¬ 
ical  expression  that  does  not  have  negations,  called  the 
acceptance  expression.  It  includes  variables  that  rep¬ 
resent  mcdicad  data,  as  well  as  equalities,  inequalities, 
*^t-element”  relations,  conjunctions,  and  disjunctions 
(Figure  2a).  In  addition,  the  agent  uses  the  logical  com¬ 
plement  of  the  eligibility  criteria,  called  the  r^ectum 
expressiony  whirii  also  does  not  have  negations  (Fig¬ 
ure  2b).  It  describes  the  conditions  that  make  a  patient 
Ineligible  for  the  trial. 

The  agent  collects  data  until  it  can  determine whidi  of 
the  two  expressions  is  true.  For  instance,  if  a  patient’s 
sex  is  MALE,  then  the  rejection  expression  in  Figure  2(b) 
is  TRUE,  and  the  agent  immediately  determines  that  this 
trial  is  inappropriate.  If  the  sex  is  female,  the  agent 
asks  more  questions. 

If  the  knowledge  base  includes  multiple  clinical  trials, 
the  agent  checks  a  patient’s  cUgibiUw  for  each  of  them. 
It  first  asks  for  the  tests  related  to  multiple  trials,  and 
then  requests  additional  tests  for  specific  trials.  After 
getting  cadi  new  answer,  the  agent  ro-c\'a}uates  the  pa¬ 
tient’s  eligibility  for  each  trial. 

V.  Order  OF  Tests 

If  a  patient’s  records  do  not  include  enough  data, 
the  agent  asks  for  additional  tests;  for  example,  if  the 
records  do  not  provide  data  for  the  eligibility  criteria  in 
Figure  1,  the  agent  asks  for  the  mammogram,  biopsy, 
and  electrocardiogram.  The  total  cost  of  tests  may  de¬ 
pend  on  their  order;  for  instance,  if  we  begin  with  the 
mammograni,  and  it  shows  tliat  the  caiiccr  stage  is  iv, 
then  we  can  immediately  reject  the  trial  in  Figure  1  and 
avoid  the  more  expensive  tests. 

We  have  explored  hcuri5;tics  for  ordering  the  tests, 
based  on  the  lest  costs  and  the  structure  of  acceptance 
and  rejection  expressions.  The  heuristics  use  a  disjunc¬ 
tive  normal  fonn  of  these  expressions:  that  is,  each  ex¬ 
pression  must  be  a  disjunction  of  conjunctions.  For  ex¬ 
ample,  the  rejection  expression  in  Figure  2(b)  is  in  dis¬ 
junctive  normal  form,  whereas  the  acceptance  expres¬ 
sion  in  Figure  2(a)  is  not.  If  the  system  uses  ordering 
heuristics,  it  converts  this  acceptance  expression  into 
the  disjunctive  norma!  forni  shown  in  Figure  3. 


The  agent  chooses  the  order  of  tests  that  reduces  their 
expected  cost.  After  getting  the  results  of  the  first  test, 
it  re-evaluates  the  need  for  the  other  tests  and  revises 
their  ordering.  The  choice  of  the  first  test  is  based  on 
three  criteria.  The  agent  scores  all  required  tests  ac¬ 
cording  to  these  criteria,  computes  a  linear  combination 
of  the  three  scores  for  every  test,  and  chooses  the  test 
with  the  highest  score. 

1.  Cost  of  the  tssi.  The  agent  prefers  cheaper  tests. 
For  instance,  it  may  start  with  the  mammogram,  which 
is  cheaper  than  the  other  two  tests  in  Figure  1. 

2,  Number  of  cliniocU  trials  thof  require  the  test. 
When  the  agent  checks  a  paticnt*s  eligibaity  for  several 
trials,  it  prefers  tests  that  provide  data  for  the  largest 
number  of  trials.  For  example,  if  the  electrocardiogram 
gives  data  for  two  different  trials,  the  agent  may  prefer 
it  to  the  mammogram  despite  its  higher  cost. 

S.  Number  of  clauses  that  include  the  teat  results. 
The  agent  prefers  the  tests  that  provide  data  for  the 
laigest  number  of  clauses  in  the  acceptance  and  rejec¬ 
tion  expressions.  Fbr  example,  the  mammogram  data 
affect  both  clauses  of  the  acceptance  expression  in  Fig¬ 
ure  3  and  two  clauses  of  the  rejection  expression  in  Fig¬ 
ure  1(b).  On  the  other  hand,  the  electrocardiogram  af¬ 
fects  only  one  clause  of  the  acceptance  expression  and 
one  clause  of  the  rejection  expression:  thus,  the  agent 
should  order  it  after  the  mammogram. 

VI.  User  Interface 

The  agent  includes  a  web-based  interface  that  allows 
clinicians  to  enter  patients’  data  through  remote  com¬ 
puters;  the  interface  consists  of  five  screens  (Figure  4). 

The  start  screen  is  for  adding  and  retrieving  patients 
(Figure  5).  After  a  user  enters  a  patient's  name,  the 
agent  displays  a  list  of  the  available  trials  (Figure  6). 
The  user  can  dioosc  a  subset  of  these  trials,  and  then 
the  agent  checks  cHgibilitj'  only  for  the  selected  trials. 
The  next  screen  is  for  basic  personal  and  medical  data, 
sucli  as  sex,  age,  and  cancer  stage  (Figure  7). 

After  the  agent  gets  the  basic  data,  it  prompts  the 
user  for  medical  information  related  to  specific  trials 
(Figure  8).  When  the  user  enters  medical  data,  the 
agent  continuously  ren^luatcs  the  patient’s  cligibifity 
and  shows  the  decision  for  each  trial.  If  the  patient 
is  ineligible  for  some  trials,  the  user  can  find  out  the 
reasons  by  clicking  the  "Why”  button.  The  interface 
also  includes  a  screen  for  the  review  and  modification  of 
the  previous  answers,  similar  to  the  screen  in  Figure  8. 

VII.  Experiments 

We  have  built  a  knowledge  base  for  the  breast-cancer 
clinical  trials  at  the  H.  Lee  Mofiitt  Cancer  Center,  ap¬ 
plied  the  agent  to  retrospective  data  from  187  past  pa¬ 
tients  and  57  current  patients,  and  compared  the  results 
with  manual  selection  by  clinicians  at  the  cancer  center. 

We  summarize  the  results  for  the  past  patients  in  Ta¬ 
ble  1,  and  the  results  for  the  current  patients  in  Table  11. 
The  “same  matches”  column  includes  the  number  of  pa¬ 
tients  who  have  been  selected  by  both  human  clinicians 
and  the  automated  agent.  The  “new  matches"  column 
gives  the  number  of  patients  who  have  been  matdied 


TABLE! 

Results  of  matching  187  past  patients. 


Clinical 

Trial 

Same 

Matches 

New 

Matches 

Missing 

Data 

10822 

10 

5 

0 

10840 

0 

19 

3 

11072 

48 

26 

19 

11378 

4 

19 

3 

11992 

5 

6 

0 

12100 

8 

20 

13 

12101 

20 

30 

0 

TABLE  II 

RESULTS  OP  MATCHINC  57  CURRENT  PATIENTS. 


Clinical 

TVial 

Same 

Matches 

New 

Matdies 

Missing 

Data 

11132 

4 

i 

1 

11971 

3 

0 

0 

12100 

0 

2 

0 

12101 

4 

21 

0 

12601 

0 

1 

0 

11931 

1 

8 

0 

12775 

1 

4 

0 

by  the  agent  but  potentially  missed  by  human  clini¬ 
cians.  Finally,  the  last  column  shows  the  number  of 
patients  whose  available  records  are  Incomplete.  Clini¬ 
cians  have  found  trials  for  these  patients,  but  the  agent 
cannot  identify  these  matches  bwause  of  missing  data. 
The  agent  has  found  a  number  of  matches  potentially 
missed  by  human  clinicians;  thus,  it  can  help  to  recruit 
more  patients  for  clinical  trials. 

In  Tabic  III,  we  give  the  mean  test  costs  with  and 
without  the  ordering  heuristics  for  the  187  past  patients. 
The  results  show  that  the  implemented  heuristics  reduce 
the  costs  by  more  than  a  factor  of  two. 

VIII.  Scalabiuty 

The  time  complexity  of  evaluating  the  acceptance  and 
rejection  expressions  is  linear  in  their  size.  Experiments 
on  a  Sun  Ultra  10  have  shown  that  the  evaluation  takes 
about  0.02  seconds  per  question,  and  the  time  Is  linear  in 
the  number  of  questions.  Typical  eligibility  conditions 
for  a  cUnical  trial  include  ten  to  thirty  questions;  thus, 
the  evaluation  time  is  0.2  to  0.6  seconds  per  trial. 


TABLE  m 

Cost  savings  by  test  reordering. 


Clinical 

Trial 

Average  Dollar  Cost  | 

Without  Test 
Reordering 

With  Test 
Reordering 

10822 

$20 

$8 

10840 

$0 

$0 

11072 

S556 

$194 

11378 

$34 

$0 

11992 

$87 

$34 

12100 

$0 

$0 

12101 

$24 

$22 

Adding  patients  \  Selecting  clinical  trials  Entering  initial  data  Entering  medical  data 

♦  •  Add  a  new  patient  ^  •  Choose  candidate  lials  -  •  Answer  initial  questions  «.  •  Enter  lest  lesults 

•  Find  an  old  patient  #  View  available  trials  *  Change  previous  answers  •  View  eligibility  decisions 

- 1.1 -  -  vrzzT 

Revising  medical  data 
•  View  test  results 
.  •  Change  some  csults 

Fig.  A,  Entering  a  patient’a  data.  The  web^based  interface  for  data  entry  consists  of  five  screens.  We  show  these  screens  by 
rectangles  and  the  transitions  between  them  by  arrows- 


Fig.  5.  Adding  new  patients  and  retrieving  existing  patients. 


Pig.  6.  Selecting  dinical  trials. 


'Btfw  iBMf  •odu  arc  pttfncT  K 
•  Mp  r  pefa 


Fig.  7.  Entering  basic  information  for  a  patient. 

PROTOCOL - STATUS - QUESTIONS  BEH/UraNG - PERCENTAGE  OF  QUEPITONS  ANSTVERH) 

i  Wh}? 


Fig.  8.  Entering  medical  data. 


(a)  EligibUity  criteria 

1.  The  patient  is  female. 

2.  She  is  at  most  forty-five  years  old. 

3.  Either 

•  her  cancer  is  not  invasive,  or 

•  her  cancer  is  not  recurrent. 

4.  Either 

•  at  most  three  lymph  nodes  have  tumor  cells,  or 

•  all  tumors  are  smaller  than  2.5  centimeters. 

5.  Either 

•  the  patient  has  no  cardiac  arrhythmias,  or 

•  the  patient  has  no  congenital  heart  disease. 

(b)  Acceptance  expression 

sex  ss  female  and 
age  <  45  and 

(invosixje  ss  NO  or  recuTieni  *=  no)  and 
(lymph-nodes  <  3  or  ftimor-s«c  <  2.5)  and 
(arrhythmias  =  NO  or  congenital  =  no) 

(c)  Reduced  expression 

sex^  FEMALE  and 
age  <  45  and 

invasive-and-recarrent  =  no  and 
(lymph-nodes  <  3  or  tumor-size  <  2.5)  and 
arrhythmias-and-congenital  =  NO 


Fig.  9.  Hcducing  the  number  of  disjunctions.  The  conversion 
of  the  eligibility  criteria  (a)  into  a  logical  expression  (b) 
leads  to  an  explosion  in  the  size  of  the  corresponding 
disjunctive  normal  form.  We  can  prevent  the  explosion 
by  replacing  some  disjunctions  with  single  questions  (c). 


The  linear  scalability  is  an  important  advantage  over 
Bayesian  systems,  which  do  not  scale  to  a  large  number 
of  clinical  trials  (7, 21,  23).  The  authors  of  these  systems 
have  reported  that  the  sizes  of  the  underlying  networks 
arc  supcrlincar  in  the  number  of  trials  [22,  37],  and  the 
training  time  is  supcrlincar  in  the  network  size  [24,  34]. 

If  the  agent  uses  the  cost-reduction  heuristics,  it  con¬ 
verts  the  acceptance  and  rejection  esepressions  into  dis¬ 
junctive  normal  form,  which  can  potentially  lead  to  an 
explosion  in  their  size.  For  example,  if  eligibility  con¬ 
ditions  arc  as  shown  in  Figure  9(a),  the  agent  initially 
generates  the  expression  in  Figure  9(b).  If  the  agent 
converts  it  to  disjunctive  normal  form,  the  resulting  ex¬ 
pression  consists  of  eight  clauses. 

Altliough  the  conversion  may  result  in  impractically 
large  expressions,  experiments  have  shown  that  this 
problem  does  not  arise  in  practice  because  the  number 
of  nested  disjunctions  is  usually  small.  Rirthermorc, 
we  can  eliminate  some  disjunctions  by  combining  their 
elements  into  longer  questions.  For  instance,  we  can 
represent  Condition  3  in  Figure  9(a)  by  a  single  ques¬ 
tion:  “Does  the  patient  have  both  invasive  and  recurrent 
cancer?”  If  we  apply  this  modification  to  Conditions  3 
and  5,  then  we  obtain  the  expression  in  Figure  9(c),  and 
its  conversion  to  disjunctive  normal  form  results  in  an 
expression  with  two  clauses. 


IX.  Concluding  Remarks 

We  have  developed  an  agent  that  automatically  as¬ 
signs  patients  to  clinical  trials.  We  have  described  the 
representatico  of  selection  criteria,  heuristics  for  order¬ 
ing  of  tests,  and  a  web-based  interface  for  entering  pa¬ 
tients^  data,  which  will  enable  physicians  across  the 
country  to  access  a  central  repository  of  clinical  trials. 

Experiments  have  confirmed  that  the  agent  has  the 
potential  to  find  more  participants  for  clinical  trials. 
They  have  also  shown  that  the  ordering  of  medical  tests 
affects  their  overall  cost,  and  the  implemented  heuris¬ 
tics  can  reduce  the  cost  of  finding  trial  participants.  The 
heuristics  do  not  account  for  the  probabilities  of  possible 
test  results,  and  we  plan  to  add  probabilistic  reasoning 
as  part  of  the  future  work. 

Acknowledgments:  This  work  has  been  partially  sup¬ 
ported  by  the  Breast  Cancer  Research  Program  of  the 
U.S.  Army  Medical  Research  and  Materiel  Command 
under  contract  DAMD 17-09-1-0244,  and  ty  the  H.  Lee 
Moffitt  Cancer  Center. 

References 

[1]  D.  Bareford  and  A.  Haling.  Inappropriate  use  of 
laboratory  services:  Long  term  combined  approach 
to  modify  request  patterns.  British  Medical  Jovntal, 
301(6764):1305-1307,  1990. 

[2]  Sanjukta  Bhanja,  Lynn  M.  Fletcher,  Lawrence  O.  Hall, 
Dmitry  B.  GoJdgof,  and  3eSfrey  P.  Krischer.  A  qual¬ 
itative  expert  system  for  clinical  trial  assignment.  In 
Proceedings  of  the  Eleventh  International  Florida  Ar¬ 
tificial  Inielligence  Research  Society  Conference,  pages 
84-68.  1998. 

[3]  Jacques  Bouaud,  Biiggite  Scroussi,  ^ric-Charles  An¬ 
toine,  Mary  Oozy,  David  Khayat,  and  Jean-Fraji§ois 
Boisvieux.  Hypcrtextual  navigation  operationalizing 
generic  clinical  practice  guidelines  for  patient-specific 
therapeutic  decisions.  Journal  of  the  American  Med¬ 
ical  /n/onmatic5  Association,  S(suppl.):488-492,  1998. 

[4]  Jacques  Bouaud.  firiggitc  Scroussi,  &ic-Charlc8  An¬ 
toine,  Laurent  S^lek,  and  Marc  Spidmann.  Reusing 
ONCOOOC,  a  guideline-based  decision  support  system, 
across  institutions:  A  successful  experiment  in  shar¬ 
ing  medical  knowledge  In  Froceedm^rs  of  the  Amer¬ 
ican  Medical  Informatics  Association  Annual  Sympo¬ 
sium,  volume  7,  2000. 

[5]  Bruce  C.  Buchanan  and  Edward  H.  Shortliffe.  Buie- 
Based  Expert  5ystcms.*  The  MYCIN  Experiments  of  the 
Stanford  Heuristic  Programming  Project  Addison- 
Weslcy,  Reading,  MS.  1984. 

[6]  Robert  W.  Carlson,  Samson  W.  Tu,  Nancy  M.  Lane, 
Tze  L.  Lai,  Carol  A.  Kemper.  Mark  A.  Musen,  and  Ed¬ 
ward  H.  Shortliffe.  Computer-based  screening  of  pa¬ 
tients  with  HIv/aids  for  clinical  trial  eligibility.  Online 
Journal  of  Currcnl  Clinical  TYials,  4(179),  1995. 

[7]  Francisco  J.  Dicz.  Jose  Mira,  E-  llurraldc,  and  S.  Zubil- 
laga.  DlAVAU  a  Bayesian  expert  sx'stcm  for  cchocardiog- 
raphv.  Artificial  Intelligence  t«  Medicine,  lO(J):59-73, 
1997' 

[6j  Lesley  Fallowfleld.  D.  Ralcliffe.  and  Robert  Souhami. 
Clinicians*  attitude.*;  to  clinical  trials  of  cancer  therapy. 
European  Journal  of  Cancer,  33(1 3): 222 1-2229,  1997. 
[9]  John  H.  Gcnnari  and  Madhu  Reddy.  Participatory  de¬ 
sign  and  an  eligibility  screening  tool.  In  Proceedings  of 
the  American  ^fed^axl  Informatics  Association  Annual 
Fall  Symposium,  pages  290-294.  2000. 

1 10]  Carol^m  Cook  Gotay.  Accrual  to  cancer  clinical  trials: 


Directions  from  the  research  literature.  i5ocmJ.  Science 
OfMl  Medicine,  33(5):56«77,  1991. 

(Ill  Peter  Hammond  and  Marek  J.  Sergot.  Computer  sup¬ 
port  for  protocol-based  treatment  of  cancer.  Journo/  of 
Logic  Programming,  26(2):93^111,  1996. 

(12j  M.  Korver  and  A.  R,  Janssens.  Development  and  vali¬ 
dation  of  HEPAa,  an  expert  ^stem  for  the  diagnosis  of 
disorders  of  the  liver  and  biliary  tract.  Medical  Infor* 
maUcs,  16(a):259~270,  1993. 

(13]  M.  Korver  and  Peter  J.  F.  Lucas.  Converting  a  nile- 
based  expert  system  into  a  belief  network.  Medical  /n- 
/ortnolica,  18(3)i219-241, 1993. 

(14J  Cyrus  KotwalL  Leo  J.  Mahoney,  Robert  E.  Myers,  and 
Linda  Decoste.  Reasons  for  non-entry  in  randomised 
dinical  trials  for  breast  cancer:  A  single  institutional 
study.  JovmaJ  of  Surgical  Oncology,  50:125-129,  1992. 

(15)  Peter  J.  F.  Lucas.  Refinement  of  the  hepar  expert  ays- 
tern:  Tools  and  tediniquea.  Journal  of  Artificial  IrUel- 
Ugence  »n  Medicine,  6(2):175“188,  1994. 

(16)  Peter  J.  P.  Lucas,  R.  W.  Segaar,  and  A.  R.  Janssens. 
REPAJt:  An  expert  system  for  the  diagnosis  of  disorders 
of  the  liver  and  the  biliary  tract.  Liver,  9:266-275, 1989. 

(171  Midiael  D.  McNccIy  and  Beverly  J.  Smith.  An  inter¬ 
active  expert  system  for  the  ordering  and  interpreta¬ 
tion  of  lal^xatory  tests  to  enhance  diagnosis  and  control 
titiJi5ULtion.  Canadian  Medico/  Jnformaiics,  2 (3):  15-19, 
1995. 

[18]  Ian  R.  Morrison,  B.  A.  Schaefer,  and  Beverly  J.  Smith. 
Knowledge  acquisition:  The  ACQUIRE  approach.  In  Pro- 
ceedings  of  ike  First  Semi-- A  nnual  Conference  in  Policy 
Making  and  Knoiotedge  Systems,  1991. 

[19]  Mark  A.  Musen.  Automated  Generation  of  Model-Based 
Knowledge  Acquisition  Tools.  Morgan  Kaufmann,  San 
Mateo,  CA,  1989. 

(201  A..  Musen,  Samson  W.  Tb,  Amar  K.  Das,  and 

Yuval  Shahar.  EON:  A  component-based  approach  to 
automation  protocol-directed  thtarapy.  Journal  of  the 
American  Medical  Informatics  Associ<Uion^  3(61:367- 
388,  1996. 

121]  Luci/a  Ohno-Machado,  Eduardo  Parra,  Suaanne  B. 
Henry,  Samson  W.  TVi,  and  Mark  A.  Musen.  AlDS^: 
A  decision-support  tool  for  decreasing  physicians*  un¬ 
certainty  regarding  patient  eligibility  for  HIV  treatment 
protocols.  In  Proceedings  of  the  5eventeenth  Annua/ 
SymposTum  on  Computer  Applications  in  Medical  Cafe, 
pages  429-433,  1993. 

[22]  Agnieszka  Oni^ko,  Marek  J-  Druzdzel,  and  Haima  Waay- 
luk.  Learning  Bayesian  network  parameters  from  small 
data  sets:  Application  of  noisy-OR  gates.  In  Proceedm^s 
of  the  WorhJwp  on  Bayeston  ond  Causal  Metuorks: 
Fkom  Inference  to  Data  Mining,  2000. 

(23]  Agnieszka  Onisko,  Mark  J.  Druzdzcl.  and  Hanna  Wasy- 
luk.  Application  of  Bayesian  bdief  networka  to  diagnosis 

.  of  liver  disorders.  In  PTX}ceeding$  of  the  Third  Confer- 
ence  on  Nettrtd  Networks  and  Their  AppUcaiUms,  pages 
730-736.  1997, 

(24|  Conslantinos  Papaconstantinou,  Georgios  Theocharous, 
and  Sridhar  Mahadevan.  An  expert  system  for  assigning 
patients  into  clinical  trials  basc^  on  Bayesian  networks. 
Journal  of  Medical  Systems,  22(3): 189-202,  1998. 

[25]  Fkanco  Perraro,  Paolo  Rossi,  Carlo  Liva,  Adolfo  Bul- 
foni,  G.  Ganzini,  and  Adriano  Giustinelli.  Inappropriate 
emergency  test  ordering  in  a  general  hospital:  Prelimi¬ 
nary  reports.  Quality  Assurance  Health  Care,  4:77-81, 
1992. 

(26]  Briggite  Seroussi,  Jacques  Bouaud,  and  &ic-Charles 
Antoine.  Enhancing  clinical  practice  guideline  compli¬ 
ance  by  involving  physicians  in  the  decision  process.  In 
Werner  Horn,  Yu^  Shahar,  Greger  Lindberg,  Steen 
Andreassen,  and  Jeremy  C.  Wyatt,  editors,  Artificial 


Intelligence  in  Medtctne,  pages  76-85.  Springer- Verlag, 
Berlin,  Germany,  1999. 

(27]  Briggite  Sdrousst,  Jacques  Bouaud,  and  ^rioGbart^ 
Antoine.  Users*  evaluation  of  ONCODOC,  a  breast  can¬ 
cer  therapeutic  guideline  delivered  at  the  point  of  care. 
Journo/  of  the  American  Medtool  Informatics  Associa¬ 
tion,  6(5}:384-369,  1999. 

(28]  Briggite  S^oussi,  Jacques  Bouaud,  and  ^)ric-Charles 
Antoine.  ONCODOC:  A  successful  experiment  of 
computer-supported  guideline  development  and  imple¬ 
mentation  in  the  treatment  of  breast  cancer.  Artificial 
/ntel/iyencc  in  Medicine,  22(l):43-64,  2001. 

(29]  Briggite  Sdroussi,  Jacques  Bouaud,  firic-Charles  An¬ 
toine,  Lament  2<clek,  and  More  Spielmann.  Using  ON¬ 
CODOC  as  a  computer-based  digibiUty  screening  sys¬ 
tem  to  improve  accrual  onto  breast  cancer  clinical  tri¬ 
als.  In  Silvana  Quaglini,  Pedro  Barahona,  and  Steen 
Andreassen,  editors,  Artificial  Intelligence  m  Medicine, 
pages  421-^.  Springer-Verlag,  Berlin,  Germany,  2001. 

|30]  hkiward  H.  Shortiiffe.  MYCIN;  A  Rule-Based  Computer 
Program  for  Advising  Phystcions  Regarding  Antimicro¬ 
bial  Therapy  Selection.  PhD  thesis.  Computer  Science 
Department,  Stanford  University,  1974. 

(31]  Edward  H.  Shortliffe,  Randall  Davis,  Stanton  G.  Ax- 
llne,  Bruce  G.  Buchaxian,  Cordell  C.  Green,  and  Stanley 
Cohen.  Computer-based  consultations  in  dinical  ther¬ 
apeutics:  Explanation  and  rule  acquisition  capabilities 
of  the  MYCIN  system.  Computers  and  Biomedical  Re¬ 
search,  8:303-320,  1975. 

[32]  Edward  H.  SbortUffe,  A.  Carlisle  Scott,  Miriam  B. 
BischofF,  William  van  Melle,  and  Charlotte  D.  Jacdis. 
ONCOCIN:  An  expert  system  for  oncology  protocol  man¬ 
agement.  Id  Proceedings  of  the  Seventh  internationed 
Joint  Conference  in  Artificial  Intelligence,  pages  876- 
$81,  1981. 

(33}  Beverly  J.  Smith  and  Michael  D.  McNcely.  The  influ¬ 
ence  of  an  expert  ^tem  for  test  ordering  and  Interpre* 
tation  on  laboratory  investigations.  Clinical  Chemistry, 
45(8):lie8-1175,  1999. 

(34]  Georgios  Theocharous.  An  export  system  for  assigning 
patients  into  clinical  trials  based  on  Bayesian  networks. 
Master’s  thesis,  Computer  Science  and  Engineering  De¬ 
partment,  University  of  South  Florida,  1996. 

(35]  Samson  W.  Tb,  Carol  A.  Kemper,  Nancy  M.  Lane, 
Robert  W.  Carlson,  and  Mark  A.  Musen.  A  method¬ 
ology  for  determining  patients*  eligibility  for  clinical  tri¬ 
als.  Journo/  of  Methods  of  Information  in  Medicint, 
32(4):317-325.  1993. 

(36]  Carl  Van  Walraven  and  C.  David  Naylor.  Do  wc  know 
what  inappropriate  laboratory  utilization  is?  A  system¬ 
atic  review  of  laboratory  dinical  audits.  Journal  of  the 
American  Modical  Association,  280(6) :55Ck-558,  1998. 

(37]  Haiqin  Wang  and  Marek  J.  Druzdzel.  User  interface 
tools  for  navigation  in  conditional  probability  tables  and 
elicitation  of  probabilities  In  bayesian  networks.  In  Prtr- 
ceedings  of  the  Sixteenth  Ctmfererux  on  I/ncertoin/y  tn 
Artificial  Intelligence,  pages  617-625,  2000. 

(38]  Salim  Yusuf,  Peter  Held,  K.  K.  Teo,  and  Elizabeth  R. 
Tbretsky.  Selection  of  patients  for  randomized  controlled 
trials;  Implications  of  wide  or  narrow  eligibility  criteria. 
Statistics  in  Medicine,  9:73-86,  3990. 


Knowledge  Acquisition  for  Clinical- Trial  Selection 

Sawas  Nikiforou,  Eugene  Fink,  Lawrence  O.  Hall,  Dmitry  B.  Goldgof,  and  Jeffrey  P.  Krischef 

nikiforo@csee.iisf.edu,  eugene@csee.usf.edu,  hall@csee.usf.edu, 
goldgof@csee.usf.edu,  jpkrischer@moffitt.usf.edu 

Computer  Science  and  Engineering,  University  of  South  Florida,  Tampa,  Florida  33620 


Abstract —  When  medical  researchers  test  a  new 
treatment  procedure,  they  recruit  patients  with  ap¬ 
propriate  medical  histories.  An  experiment  with  a 
new  procedure  is  called  a  clinical  trial.  The  selection 
of  patients  for  clinical  trials  has  traditionally  been  a 
labor-intensive  task,  which  involves  the  matching  of 
medical  records  with  a  list  of  eligibility  criteria,  and 
studies  have  shown  that  clinicians  can  miss  up  to 
60%  of  the  eligible  patients.  A  recent  project  at  the 
University  of  South  Florida  has  been  aimed  at  the 
automation  of  this  task.  We  have  developed  an  in¬ 
telligent  agent  that  selects  trials  for  eligible  patients. 
We  report  the  work  on  the  representation  and  entry 
of  the  related  knowledge  about  clinical  trials.  We 
describe  the  structure  of  the  agent's  knowledge  base 
and  the  interface  for  adding  new  trials. 

Keywords-^  Knowledge  representation,  medical  ex¬ 
pert  systems,  user  interfaces. 

I.  Introduction 

Cancer  causes  550,000  deaths  in  the  United  States 
every  year,  and  the  treatment  of  cancer  is  an  active 
research  area.  Medical  experts  explore  new  treatment 
methods,  such  as  drugs,  surgery  techniques,  and  radi¬ 
ation  therapies.  An  experiment  with  a  new  treatmeiA 
procedure  is  called  a  clinical  trial.  When  researchers 
conduct  a  trial,  they  recruit  patients  with  an  appro¬ 
priate  cancer  type  and  modic^  history.  The  selection 
of  |>aticnt5  has  traditionally  been  a  manual  procedure, 
and  studies  have  shown  that  cViiucians  can  miss  up  to 

60%  of  the  eligible  patients  [12,  22,  30|. 

A  recent  project  at  the  Univerrity  of  South  Florida 
has  been  aimed  at  automatic  selection  of  patients  for 
clinical  trials.  We  have  developed  an  InteDigetit  agent 
that  prompts  a  clinician  for  a  patient^s  data  and  identi¬ 
fies  all  matching  trials  [1,  11].  It  includes  a  knowledge 
base  with  information  about  available  dinical  trials,  cri¬ 
teria  for  selecting  patients,  and  related  medical  tests. 

We  report  the  work  on  a  web-based  interface  that  en¬ 
ables  a  clinician  to  enter  new  trials  without  the  help 
of  a  programmer.  We  have  used  the  interface  to  build 
a  knowledge  base  for  clinical  trials  at  the  Moffitt  Can¬ 
cer  Center,  located  at  the  University  of  South  Florida.. 
Wc  review  the  previous  work  on  medical  expert  systems 
(Section  II),  explain  the  knowledge  representation  in  the 
developed  agent  (Section  III),  and  describe  the  interface 
for  adding  new  knowledge  (Section  IV). 

II.  Previous  Work 

Researdiers  began  to  work  on  medical  applications 
of  artificial  intelligence  in  the  early  seventies.  Short- 
liffe  and  his  colleagues  developed  the  MYCIN  system, 


which  diagnosed  bacterial  diseases  [5,  25,  26].  Exper¬ 
iments  showed  the  effectiveness  of  MYCIN,  whidi  led  to 
the  development  of  other  medical  systems  [5,  14],  such 
as  NEOMYCIN,  PUFF,  CENTAUR,  and  VM. 

Musen  et  at.  built  a  rule-based  system,  called  EON, 
that  selected  AIDS  patents  for  clinical  trials  [17].  Ohno-; 
Machado  ct  oL  dc^oped  the  AiD^sy  stem,  whidi  also 
assigned  aids  patients  to  clinical  tri^  [19] .  Bouaud  et 
at.  created  a  cancer  expert  system,  called  ONCODOC, 
that  suggested  alternative  trials  for  each  patient  and  al¬ 
lowed  a  phy^cian  to  chocfie  among  them  [3, 4).  Serousa 
used  ONCODOC  to  select  participants  for  clinical  trials 
at  two  hospitals,  whidi  helped  to  increase  the  number 
of  selected  patients  by  a  factor  of  three  (23,  24). 

Early  expert  systems  did  not  have  knowledge- 
acquisition  tools,  and  programmers  hand-coded  the  re¬ 
lated  rules.  To  simplify  knowledge  entry,  researchers 
implemented  specialized  tools  for  some  systems  (13, 15). 

Eriksson  pointed  out  the  need  for  tools  that  would  al¬ 
low  efficient  knowledge  acquisition,  and  described  a  sys¬ 
tem  for  building  such  tools  [6].  Thllis  ei  al.  developed  a 
library  of  scripts  for  modifying  knowledge  bases,  whidi 
helped  to  enforce  the  consistency  of  the  modified  knowl¬ 
edge  [7,  27,  28,  29].  Kim  and  Gil  considered  the  use 
of  scripts  for  building  new  knowledge-acquisition  tools, 
and  created  a  system  for  evaluating  these  tools  [9, 10|. 
Blythe  et  al.  designed  a  general  knowledge-acquisition 
interface  based  on  previous  techniques  (2]. 

Musen  developed  the  PROTI^E  environment  for  cre¬ 
ating  loiowledge-acquiaition  tools  (14, 16],  which  proved 
effective  for  the  development  of  knowledge  systems,  in¬ 
cluding  the  AIDS  expert  systems  [20],  asthma  treatment 
selection  [8],  and  elevator-design  rules  [21]. 

III.  Knowledge  Base 

Physicians  at  the  Molfitt  Cancer  Center  have  about 
150  clinical  trials  available  for  cancer  patients.  They 
have  identified  criteria  that  determine  a  patient’s  eligi¬ 
bility  for  each  trial,  and  they  use  these  criteria  to  select 
trials  for  eligible  patients.  IVaditionally,  physicians  have 
selected  trials  by  a  manual  analysis  of  patients*  data. 
The  review  of  resulting  selections  has  shown  that  they 
usually  do  not  check  all  clinical  trials  end  occasionally 
miss  an  appropriate  trial. 

To  address  this  problem,  we  have  built  an  intelligent 
agent  that  helps  to  select  trials  for  each  patient.  It 
prompts  a  clinician  to  enter  the  results  of  medical  tests, 
and  uses  them  to  identify  appropriate  trials. 

In  Figure  )  (a),  we  give  a  simplified  example  of  eligibil- 
ify  criteria  for  a  clinical  trial.  This  trial  is  fex  young  and 


(a)  Eligibility  criteria 

1.  Tbe  patient  is  femaJe. 

2.  Sbc  is  at  most  forty-five  years  dd. 

3.  Her  cancer  stage  is  U  or  III. 

4.  Her  cancer  is  not  invasive. 

6.  At  most  three  lymph  nodes  have  tumor  cells. 
6.  Either 

•  the  patient  has  no  cardiac  arrhythimas,  or 

•  all  tumors  are  smaller  than  2.5  centimeters. 

(b)  Tests  and  questions 

Geneml  in/ormaiion 
What  is  the  patient’s  sex? 

What  is  the  patient’s  age? 

Mammogmm,  Cost  is  $150 
What  is  the  cancer  stage? 

Does  the  patient  have  invasive  cancer? 

Biopsy,  Cost  is  $3t)0 
What  is  the  cancer  stage? 

How  many  lymph  nodes  have  tumor  cells? 

What  is  the  greatest  tumor  diameter? 

Eltctrocavdiogvosti,  Cost  is  $200 

Does  the  patient  have  cardiac  arrhythmias? 

(c)  Eiigibitity  expression. 

scar  ~  FEMALE  and 
age  <  45  and 
canccf'Stayc  €  {li,  lu}  and 
invasiwc-cjanccr  =  NO  and 
lymph-nodes  <  3  and 
(arrhythmias  =  NO  or 
tumar^ diameter  <  2.5) 


Fig.  1,  Example  of  eligibility  aitcria,  tests,  and  questions. 


middle-aged  women  with  a  noninvasivc  cancer  at  stage 
II  or  III.  When  testing  a  patient’s  eligibility,  a  clinician 
has  to  order  three  medical  tests  (Figure  lb).  The  agent 
first  prompts  the  clinician  to  enter  the  patient’s  sex  and 
ago.  If  the  patient  satisfies  the  corresponding  condi¬ 
tions,  the  agent  asks  for  the  mammogram  results  and 
verifies  Conditions  3  and  4;  then,  it  requests  the  biopsy 
and  electrocardiogram  data. 

The  agent’s  knowledge  base  includes  questions,  tests, 
and  logical  cxprcssbns  that  rcpreseit  eligibility  for  each 
trial.  We  give  an  example  of  tests  and  questionsin  Fig¬ 
ure  1(b),  and  a  lopcal  expression  in  Figure  1(c). 

The  agent  supports  three  types  of  questions;  the  first 
type  takes  a  yes/no  response,  the  second  is  multiple 
dioicc,  and  the  third  requires  a  numeric  answer.  For 
example,  the  cancer  stage  is  a  multiple-choice  question, 
and  the  tumor  diameter  is  a  numeric  question.  The  de¬ 
scription  of  a  medical  lest  includes  the  test  name,  dollar 
cost,  and  list  of  questions  that  can  be  answered  based 
on  the  test  results.  For  instance,  the  mammogram  in 
Figure  I  has  a  cost  of  $150,  and  it  allows  the  answering 
of  two  questions.  Difierent  tests  may  answer  the  same 
question;  for  example,  both  mammogram  and  biopsy 
show  the  cancer  stage. 


We  encode  the  eligibility  for  a  clinical  trial  by  a  log¬ 
ical  expression,  which  may  include  variables  that  rep¬ 
resent  the  available  medical  data,  as  well  as  equalities, 
inequalities,  “set-element"  relations,  conjunctions,  and 
disjunctions.  For  example,  we  encode  the  criteria  in 
Figure  1(a)  by  the  expression  in  Pi^re  1(c). 

The  agent  collects  data  until  it  can  determine  whether 
the  eligibility  expression  is  TRUE  or  FALSE.  For  instance, 
if  a  patient’s  sex  is  male,  then  the  expression  in  Fig¬ 
ure  1(c)  is  FALSE,  and  the  agent  immediately  rejects  this 
trial-  If  the  sex  is  Female,  the  agent  has  to  ask  more 
questions.  If  the  knowledge  base  includes  roBny  clinical 
trials,  the  agent  checks  a  patient’s  eligibility  for  eadi  of 
them.  It  first  asks  for  the  tests  related  to  multiple  trials, 
and  then  requests  additional  testa  for  specific  trials. 

IV.  Entering  Eligibiuty  Criteria 

We  have  designed  a  web-based  interface  for  adding 
new  clinical  trials  [18],  which  consists  of  two  main  parts; 
the  first  part  is  for  adding  infea-mation  about  medical 
tests  (Figure  2),  and  the  second  is  for  eligibility  crite¬ 
ria  (Figure  3).  The  interface  includes  ten  screens;  two 
of  them  are  “start  screens,"  which  can  be  reached  from 
any  other  screen.  We  ^ve  an  example  of  entering  eli¬ 
gibility  criteria,  describe  the  two  parts  of  the  interface, 
and  present  experiments  on  its  effectiveness. 

Example:  Suppose  that  a  user  needs  to  enter  the  cri¬ 
teria  shown  in  Figure  1.  First,  she  utilizes  the  *^Addmg 
tests"  screen  to  enter  the  three  tests  (Figure  4).  Then, 
she  adds  the  related  questions;  to  enter  questions  for 
a  specific  test,  she  selects  the  test  and  elides  "Modify” 
(Figure  4).  and  the  agent  displeys  the  "Modifying  a  test” 
screen  (Figure  5).  Tb  add  a  question,  she  clicks  the 
appropriate  button  at  the  bottom  (Figure  5)  and  then 
types  the  question  (Figure  6). 

After  adding  the  questions  for  all  tests,  the  user  goes 
to  the  Mddiny  clinical  trials”  screen  and  initializes  a 
iM?w  trial  (Figure  7).  She  gets  the  “Selcclmp  tests" 
screen  and  chooses  the  tests  rdLated  to  the  current  trial 
(Figure  8).  Then,  she  marks  relevant  questions  and  the 
answers  that  make  a  patient  eligible  (Figure  9).  If  the 
eligibility  criteria  include  disjunctions,  she  has  to  use  the 
screen  for  composing  logical  expressions  (Figure  10). 

Tests  and  questions:  The  interface  for  adding  tests 
and  questions  includes  six  screens  (Figure  2).  The  start 
screen  is  for  riewing  the  available  tests  and  defining  new 
ones,  whereas  the  other  screens  are  for  modifying  tests 
and  adding  questions. 

We  show  the  start  screen  in  Figure  4;  its  left-hand  side 
allows  viewing  questions  and  going  to  a  modification 
screen.  If  the  user  selects  a  test  and  elides  “Vieu;,  ”  the 
agent  shows  the  questions  related  to  this  test.  If  the  user 
elides  "Modify,  ”  it  displays  the  "Modifying  a  test”  screen 
(Figure  5) .  The  right-hand  side  of  the  start  screen  allows 
adding  a  new  test  by  specifying  its  name  and  cost. 

The  "Modifying  a  test”  screen  shows  the  information 
about  a  specific  test,  which  includes  the  test  name,  cost, 
and  related  questions.  The  user  can  change  the  test 
name  and  cost;  the  four  bottom  buttons  aJI<m'  moving 
to  the  screens  for  adding  and  deleting  questions. 


Fig.  2,  Entering  tests  and  questions.  We  ebotw  the  screens  rectangles  and  the  transitions  between  them  hy  arrows.  The 
bold  rectangle  is  the  start  screen. 


Fig.  3.  Entering  eligibility  criteria. 
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Pig.  4.  Adding  a  new  test. 
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Fig.  5.  Modifying  a  test;  the  bottom  buttons  are  for  moving  to  question-entry  screens. 
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Fig.  7.  Adding  a  new  clinical  trial. 


Fig.  9.  Selecting  questions  and  answers.  The  user  checks  the  questions  for  the  current  clinical  trial  and  marks  the  answers 
that  satisfy  the  ciigibility  criteria. 


Pig.  10.  Ccmibining  questions  into  a  logical  expression. 


We  show  the  screens  for  adding  yes/no  and  nmitiple- 
choicc  questions  in  Figure  6;  the  screen  for  numeric  ques¬ 
tions  is  similar.  The  user  can  enter  a  new  question  for 
the  current  test,  along  with  s  set  of  allowed  answers.  If 
the  question  is  also  related  to  other  tests,  the  user  has  to 
mark  them  in  the  lower  box.  The  ^Deleting  guesiions” 
screen  is  for  removing  old  questions. 

Eligibility  conditions;  The  mechanism  for  entering 
eligibility  criteria  consists  of  four  screens  (Figure  3). 
The  start  screen  allows  the  user  to  initialize  a  new  din- 
teal  trial  and  view  the  criteria  for  old  trials.  If  the 
user  needs  to  modify  a  clinical  trial,  the  agent  first 


displays  the  test-selectbn  screen  (Figure  8).  The  user 
then  chooses  related  tests  and  question  t3q>es,  and  clicks 
^Continue”  to  get  the  question  list. 

The  next  screen  (Figure  9)  allows  the  user  to  select 
specific  questions  and  mark  the  answers  that  make  a 
patient  eligible.  For  a  multiple-choice  question,  the  user 
may  specify  several  eligibiEty  options;  for  example,  a 
patient  may  be  eligible  if  her  cancer  stage  is  U  or  Ul. 
For  a  numeric  question,  the  user  has  to  specify  a  range 
of  values;  for  instance,  a  patient  may  be  eli^blc  if  her 
age  is  between  0  and  45  years.  If  the  user  clicks  "Sim- 
pie  quesiicmsy”  the  agent  generates  a  conjunction  of  the 


Fig.  11.  Entry  time  for  test  sets  (left)  and  the  mean  time  per  question  for  each  set  (right).  We  plot  the  average  time  (dafihed 
lines)  and  the  time  of  the  ^test  and  slowest  users  (vertical  bars). 


Fig.  12.  Entry  time  for  eligibility  criteria.  Wc  show  the  average  time  for  each  clinical  trial  and  the  time  per  question  (dashed 
lines),  along  with  the  performance  of  the  fastest  and  slowest  users  (vertical  bars). 


selected  criteria.  If  the  cligibifity  conditions  involve  a 
more  complex  expression,  the  user  has  to  elide  “Com¬ 
bined  question”  and  then  use  the  screen  for  composing 
logical  expressions  (Figure  10). 

jBntry  time:  Wc  have  run  experiments  with  sixteen 
novice  users,  who  had  no  prior  experience  with  the  inter¬ 
face.  First,  every  user  has  entered  four  sets  of  medical 
tests;  each  set  has  included  three  tests  and  ten  ques¬ 
tions.  Then,  each  user  has  added  cligibibty  expressions 
for  ten  clinical  trials  used  at  the  Moffitt  Clancer  Center; 
the  ntimber  of  questions  in  an  eligibili^  expression  has 
varied  from  ten  to  thirty-five. 

Wc  have  measured  the  entry  time  for  cadi  test  set  and 
cadi  eligibility  expression.  In  Figure  11,  we  show  the 
mean  time  for  every  test  set  and  the  time  per  questiim 
for  the  same  sets.  All  users  have  entered  the  test  sets 
in  the  same  order,  from  1  to  4;  since  they  had  no  prior 
experience,  their  performance  has  improved  during  the 
experiment.  In  Figure  12,  we  give  similar  graphs  for  the 
entry  of  eligibdity  expressions. 

The  experiments  have  shown  that  novices  can  effi¬ 
ciently  use  the  interface;  thej”  quickly  Icam  its  full  func¬ 
tionality,  and  their  learning  curve  flattens  after  about 
an  hour.  The  average  time  per  question  is  31  seconds 
for  the  entry  of  medical  tests  and  37  seconds  for  eligi¬ 
bility  criteria,  which  means  that  a  user  can  enter  all  150 
cancer  trials  used  at  Mofi^tt  in  about  two  weeks. 


V.  Concluding  Remarks 

We  have  developed  knowledge-acquisition  tools  for  an 
agent  that  automatically  assigns  cancer  patients  to  clin¬ 
ical  trials.  We  have  described  the  representation  of  eligi¬ 
bility  criteria  and  a  web-based  interface  for  adding  new 
trials.  The  experiments  have  shown  that  a  user  can  en¬ 
ter  a  new  trial  in  fifteen  to  thirty  minutes.  Novices  can 
use  the  interface  without  prior  instructions,  and  they 
reach  their  full  speed  after  about  an  hour.  Although 
cancer  research  at  Moffitt  has  provided  the  motivation 
for  this  work,  the  agent  is  not  limited  to  cancer,  and  we 
can  use  it  for  trials  related  to  other  diseases. 
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